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This is the second of two books on machine code for the BBC Micro. 
Because the first, Discovering BBC Micro Machine Code, assumed only 
minimal prior knowledge of the subject, no attempt was made to delve into 
the more difficult (but powerful) addressing modes offered by the 6502 
assembler. Neither was it considered prudent to devote space to hardware, 
even those aspects which directly influence application software such as the 
user port, the Tube and the | MHz bus. 

For continuity reasons, this book, besides repairing some of the 
Omissions mentioned above, treats many of the relevant subjects again but 
in more detail and, as the title suggests, from a more advanced viewpoint. 
The word ‘advanced’ should be interpreted in the relative sense - relative, 
that is, to the level maintained in the first book. 

The elements of binary have been relegated to an appendix to prevent 
polluting the main text yet again. Binary, to machine code enthusiasts, is 
rather like the repeal of the corn laws to students of history — lacking in 
glamour but necessary for continuity. Many microcomputer enthusiasts 
have come from the ranks of the electronic hobbyist and will have an 
interest in computer control of electronic gadgets. However, not all readers 
would share their enthusiasm so, like binary, the treatment of TTL logic 
devices is given in an appendix also. 

The self-test questions may help those who may not be too sure whether 
they have understood what they have read ... a situation quite common 
when reading any ‘explanation’ of microprocessor behaviour patterns. 

Numerous machine code routines are included which should help 
readers to understand the more difficult parts of the text. Some of the 
lengthy programs will be found to have direct practical value, in business, 
educational or leisure fields. However, because it is impossible to anticipate 
bizarre applications, they have been written in a way which should 
encourage individual tailoring. 

The sensible way to employ machine code in the BBC machine is to use it 
in segments within a BASIC program. It would be pointless, and certainly 
masochistic, to write entire programs in machine code. The BASIC in the 
BBC machine is good but there are times when the demands of speed and 
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memory economy justify a temporary leap into machine code and back. 
The programs in this book, which have been well-tested, are primarily 
designed to be used in this way. They are indeed, no more than machine 
code segments. However, in order for them to RUN and to encourage 
experimentation, they have been spliced into an outer BASIC framework. 
As for the choice of programs, no apologies are made for the almost total 
absence of arithmetical routines. Although it is traditional to include 
routines for multiplication, division, etc., it is doubtful whether they are 
of much interest to microcomputer users. 

An entire chapter, however, is devoted to sort routines because this is one 
of the areas where the advantage of machine code over BASIC becomes 
most impressive. 

With regard to graphics, the reader is presented with two options, one 
using the resident service routines and the other using direct screen 
addressing. The latter option is recognised as being contrary to 
establishment guidance but, if there are some beneficial results to justify the 
risk, who cares? 


A. P. and D. J. Stephenson 


Chapter One 


Architecture of the 
BBC Machine 








Background material 


Although not strictly essential, the would-be machine code programmer 
will find it helpful to take some interest in the hardware layout of the 
computer and the historical events which led up to the present design. Such 
interest need not extend to detailed electronics because it is possible to gain 
a fair understanding of the overall system without it. 

The hardware of any microcomputer can be described as a collection of 
integrated circuits (chips) and a few separate (discrete) components such as 
resistors, capacitors and transistors stuck on a printed circuit board (the 
pcb). Communication with the outside world (peripherals) is via an 
assortment of plugs or sockets accessible from the back. Some computers 
may have more memory than others, may have more plugs and sockets and 
perhaps a few more chips than others but it would be difficult to pinpoint 
any profound differences in the hardware design. 


von Neumann’s influence 
A computer, whether it is one of the mainframe giants or a small one 
designed for home use, will in all probability be a‘von Neumann machine’ 
(discussed in Chapter 3). That is to say, it will be designed in accordance 
with fundamental principles laid down by the eminent mathematician of 
that name. Although it has always been fashionable to credit Charles 
Babbage with the ‘discovery’ of the computer, it is questionable whether his 
contraption of cogwheels and levers had any real effect on the evolution of 
the modern computer. It was left to von Neumann to set out the first ‘block 
schematic’, suggest the main data flow paths and the timing sequences 
required to build a practical automatic digital computer. Because of his 
contribution to computer science, John von Neumann (1903-1957) is 
affectionately known in many quarters as the ‘father of the computer’ 
although, like all leaders of a team, he probably received a dispropor- 
tionate share of the credit. 

It is sad that all great men eventually have their greatness disputed. In 
recent years, poor old von Neumann, or rather his principle, has been 
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attacked. It is said by some that computing progress has been stifled by 
slavish adherence to his original concepts of sequential data flow. They 
argue that in spite of the enormous improvements in computer power 
which have taken place over the last four decades, these have been due 
mainly to improvements in computing components rather than develop- 
ments in computing science. To some extent, this is true. In the case of 
internal memory, for instance, there is no fundamental difference between 
the old magnetic core memory with its thousands of ferrite rings wire- 
knitted together and the modern semiconductor RAM chip. Similar 
comparisons can be made between the central processor of the earlier 
machines and the modern variety. The valves gave way to discrete 
transistors which in turn gave way to boards full of logic chips. Eventually, 
the central processor, particularly in the case of the minicomputer and 
microcomputer, became available as the single-chip ‘microprocessor’. In 
spite of all this it would be true to say that apart from enormous reductions 
in cost and size, most modern computers are not profoundly different in 
principle to their World War II ancestors; they are still von Neumann 
machines. 


The computer generations 

Some attempt has been made to classify computers into so-called 
‘generations’. The early machines which used valves belonged to the First 
Generation, those which used discrete transistors became the Second 
Generation. When integrated circuits replaced discretes the Third 
Generation was born (in the mid-sixties). The first integrated circuits 
contained between four to ten simple logic gates per chip which, at the time, 
was heralded as an exciting breakthrough in technology. The most famous 
chip in that era was the 7400 quad NAND gate, manufactured by Texas 
Instruments, and one of a family of chips known as the ‘TTL logic series’. It 
is still going strong at the time of writing. Before the marvel of TTLhad time 
to be digested, Silicon Valley in California produced its next bombshell - 
‘Large Scale Integration’ or LSI which earned the name of the Fourth 
Generation. LSI chips were first produced as semiconductor read/ write 
memories. They were in fact ‘dynamic RAMs’ and were directly responsible 
for the virtual death of the traditional core memory. 


The microprocessor 

In 1971, Intel launched the first microprocessor which, quite unexpectedly, 
changed the entire nature of computing. It was unexpected in the sense that 
the device was never intended to be used in any other way than as acontrol 
element in digital-operated machinery. Instead, the 4004 sparked off a full 
scale development spree as engineers began to appreciate the enormous 
potential of such a device. Silicon Valley became split into fragments, with 
new firms each rushing to improve on the success of the 4004. The first 
improvement was Intel’s 8080, closely followed by Motorola’s 6800 and the 
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Zilog Z80. The microprocessor used in the BBC machine is the 6502 which, 
as we shall see later, may be considered as a modified 6800. Apart from the 
original 4004, which was a four-bit microprocessor, the others mentioned 
above are eight bits wide or, to use technical language, they have a ‘word 
length’ of eight-bits. This means that all data transfers into or out of the 
microprocessor take place in bunches of eight binary bits. In relation to the 
more traditional computers, a word length of eight bits is embarrassingly 
small and places a heavy responsibility on the designer of the software 
operating system. It is complicated process to handle data efficiently, 
particularly when the data is in the form of ‘large’ numbers greater than 255 
decimal. The only way it can be done is to handle numbers by eight-bit 
instalments which is time-consuming and therefore reduces computing 
power. 


16-bit microprocessors 

Over the last few years, a number of 16-bit microprocessors have appeared 
on the market although they have yet to enjoy the low prices resulting from 
mass marketing. Owners of the BBC machine will be aware of the Second 
Processor options, one of which is boasting the presence of a 16-bit 
microprocessor, the 16032. Not only is the 16032 a true 16-bit chip, some of 
the internal registers can handle 32 bits at a time. The resulting power of the 
BBC machine will then be equivalent to a minicomputer, rather than a 
microcomputer. When considering the cost of the 16-bit second processor 
(which admittedly is rather high) it should be remembered that the 
computing power available will be out of this world — at least the 
microcomputing world. Those who are considering the purchase of the 
Second Processor option would be advised to think carefully before buying 
the ‘cheaper’ 8-bit versions. They will be using either another 6502 ora Z80. 
Of course, there will be a great improvement, apart from the extra 64K of 
memory but the improvement will not be revolutionary. The 16-bit version 
will cost more than twice as much but will elevate the system to an entirely 
new dimension. From the viewpoint of the machine code programmer, the 
advantages will be even more apparent. It is dangerous to prophesy future 
market tendencies but it is quite probable that there will be an astonishing 
demand for the 16-bit processor. In fact, it may be a case of history 
repeating itself. The original miscalculation was the false estimate of the 
relative popularities of the Model A and the Model B. In spite of the £100 
extra in price, the demand for the Model B was much higher than for the 
Model A, and was one of the factors which contributed to the chaos during 
the launching year of the BBC system. 


Software development 

Improvements in software have not kept pace with hardware improve- 
ments. It is unlikely that they ever will. Machines are being built with 
frightening power and it is becoming increasingly difficult to produce 
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software of sufficient complexity to exploit fully the hardware available. 
The computing world is currently obsessed with the new breed of 
computers being designed, alleged to have built-in intelligence. These will 
represent the Fifth Generation. Already we are subjected to a new crop of 
buzzwords relating to ‘artificial intelligence’ or AI. Whether or not it is 
sensible to credit a computer (even a fifth generation species) with 
intelligence before even humans have agreed on a definition of their own 
intelligence is questionable. In any case, what is exactly meant by 
‘artificial’? Perhaps we shall soon have artificial plastic! 


Hardware of the BBC Micro 


Although the User Guide and the advertisement brochures list all the 
various input/ output facilities available, it will do no harm to repeat some 
of it here with some extra details aimed at the machine code programmer. 
When programming in BASIC, or indeed any high level language, it is not 
necessary, perhaps not even advisable, to worry much about the mysteries 
under the bonnet. In the case of machine code programming you cannot 
afford to be totally ignorant of the machine itself. Although the resident 
assembler provides some protection from the harsh realities of life beneath 
the keyboard it is still very much a game of battling directly witha primitive 
machine. 


Removing the top cover 

Even if you have no experience with electronic equipment you should not 
hesitate to remove the top cover of the machine and have a good look at 
what lies beneath. It is an easy task if carried out as follows, although it 
should be pointed out that technically the guarantee would be invalidated — 
so be very careful! 


(1) Switch off at the wall and remove the three-pin plug from the socket. 
(2) Wait for half a minute to allow stored charges to decay. 

(3) Locate the two fixing screws at the back of the case (probably marked 
FIX) and remove them. 

(4) Locate the two fixing screws underneath the case at the front, beneath 
the keyboard (probably marked FIX), and remove them. 

(5) Carefully lift away the top case, exposing the circuit board beneath. 


When ready to replace the cover again, take care that you don’t bend or 
damage the three tiny ‘red lights at the left of the space-bar. They are 
supported only by their own connecting wires and it is easy to trap them 
beneath the cover. 


Locating the components 
The dominant sight, when first exposing the interior, is the shiny 
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aluminium (or perhaps steel) box which occupies a largeareaat the left. This 
is the ‘power pack’ which converts the dangerously high (230-240 volts) 
mains supply into low voltages supplying power to the various components 
on the circuit board. The supplies are +5V at 3.75A,-SV at 0.1A, +12V at 
1.25A. All these voltages are with respect to ground (zero volts). Such low 
voltages are not dangerous so all areas of the board outside the metal cased 
power pack offer no electric shock hazards to the human finger. In spite of 
this, it is unwise to poke around too much because your fingers could cause 
damage to some of the delicate components on the board. Some of the 
chips, particularly the large ones, are very susceptible to static charges 
which can accumulate on the fingers although, theoretically, they are 
immune once they are secured in the circuit board. 

At the top right-hand corner is a small metal box marked with the 
manufacturer’s trade mark ASTEC. This is the UHF modulator which, if 
you are relying on a TV as the screen output device, modulates the logic 
voltages from the board onto an ultra high frequency signal, which is 
interpreted by the TV as a bona fide aerial signal. The output is via a 
coaxial cable outlet at the back marked UHF OUT. If you are using a 
purpose-built monitor rather than a TV, there are two alternative outputs. 
One of these is a ‘composite video’ marked VIDEO OUT and is intended to 
accommodate some makes of black and white monitors. The interface 
components include one of the few discrete transistors used. According to 
the circuit details on page 504 of the User Guide it appears impossible to 
obtain colour output unless a small modification is made. The remaining 
screen output is the RGB (Red Green Blue) interface and is the most 
efficient method of energising a colour monitor, providing a direct and 
uncluttered signal. The Microvitec colour monitor would use the RGB 
output. 


Sending characters to screen output 

As far as machine code programming is concerned, irrespective of the 
particular output in use, characters can be sent direct to a screen address by 
use of an STA instruction or, preferably, by OSWRCH. It is appreciated 
that these code words will have little meaning to those who are entirely new 
to machine code. STA in machine code is similar to POKE and OSWRCH 
is a resident operating system subroutine, standing for Operating System 
WRite CHaracter. Our reason for prematurely introducing these machine 
code terms is to emphasise a golden guideline of the BBC machine: 


Always use the official operating system subroutines to send 
or receive data to peripherals. 


This means avoiding absolute addressing of peripherals (no poking). Note 
carefully that this is a guideline and not a rule. The reason behind it is tied 
up with the second processor — should you ever buy one. Any software 
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you write using absolute and addressing of peripherals may not work when 
the second processor is connected. It should be mentioned that direct 
access to screen addresses will produce faster responses than using 
OSWRCH. 


The ROM chips 

Once the top case has been removed, as previously described, the keyboard 
can be lifted off by undoing the two exposed nuts. If the removal is merely 
to examine the components beneath, it is not essential to unplug the 
keyboard. In fact it is unwise to plug and unplug any microcomputer 
connections more frequently than necessary. Beneath the keyboard to the 
right and near the bottom are five sockets for ROM chips. Except for the 
few thousand early models, only two of these sockets will be inhabited. The 
socket at the extreme left houses a 16-pin ROM which is the BASIC. 
language interpreter. Next to this is the Operating System ROM. You 
should ensure that this ROM is the ‘latest Series 1’ model because some of 
the earlier types had a slightly suspect operating system called ‘Version 0.1’. 
It is easy to find out which operating system you have by entering * HELP 
followed by RETURN. The response should be: 


OS 1.20 
A response such as OS 0.1 indicates that the old ROM is resident. 


Bank-switching 

The remaining three sockets are left blank for additional ROMs. There are 
many firms which supply special ROMs which can be plugged into the 
vacant sockets: for example, Acorn’s word processor called ‘View or the 
other popular word processor called ‘Wordwise’ marketed by Computer 
Concepts. There are also several language ROMs available and many 
application ROMs. The total direct addressing space available on the 6502 
microprocessor is 64K. The operating system takes 16K, the BASIC 
language ROM occupies another 16K and the remaining 32K on the Model 
B machine is devoted to RAM. This means there is no direct addressing 
space left for any additional ROMs. The method employed to escape from 
the impasse is a technique known as bank-switching. This allows the 
resident BASIC language ROM to be ‘switched out’ and replaced by one of 
the special ROMs. Thus it is an ‘A or B but not both’ situation. Bank- 
switching is software-controlled which normally defaults to BASIC under 
power-up conditions. However, if additional ROMs are in place, the 
Operating system will be aware of it (or them) and the new default condition 
On power-up is to the ROM in the right-most position. For example, if 
Wordwise is plugged into any one of the vacant sockets, the default 
condition on power-up will be Wordwise instead of BASIC. To switch over 
to BASIC it will be necessary to enter * BASIC or the abbreviated form *B. 
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The response to the BASIC command *HELP will now be modified as 
follows: | 


WORDWISE 


OS 1.20 


The response indicates that Wordwise has priority over BASIC on power- 
up because it is displayed first. If the relative priorities are found in practice 
to be irksome the situation can easily be changed by swopping over the two 
ROMs so that the BASIC ROM is right-most. The unprecedented 
popularity of the BBC machine has stimulated professional programmers 
with the result that many other high-level languages are available in ROM 
form including PASCAL, FORTH and LISP. 


PROM programmers 

An alternative use for the spare sockets is to enable you to insert your own 
EPROMs. They can be bought in blank form; that is to say, they have no 
programs ‘firmed’ in. With the aid of special kits called EPROM 
programmers, it is possible to transfer a machine code program from RAM 
to the EPROM which can then be plugged into one of the spare sockets. 
There is one small snag, however. Make sure that all bugs are removed 
before you transfer it to EPROM because you cannot change parts of the 
program. It is anall or nothing process. If there is a bug in the EPROM, it is 
necessary to erase the entire contents and start again. The erasure process 
requires the chip to be exposed for a certain time in a special box containing 
an ultra-violet lamp. A fully erased EPROM contains ‘1’s in every bit 
position so all the addressed locations contain FF (hex). A discussion on 
hex notation appears in Chapter 3. 


The user port 


The BBC machine has standard interfaces for printers, disk, cassette etc. 
and also caters for individual needs by means of the user port. This is a 
multipin socket outlet accessible from the underside of the case. It is 
completely undedicated and is therefore free to activate any device you 
choose — providing you know how to make the correct connections and are 
able to write suitable software. 

Although the port behaves as an independent set of outlets, it 1s actually 
one half of a complex input/output chip called a Versatile Interface 
Adaptor (VIA for short) bearing the type number 6522. The two halves of 
the VIA are reffered to as the ‘A’ and ‘B’ sides. The B side is dedicated to the 
parallel printer interface and the A side is the user port. The port is 
essentially a ten-wire interface between the computer and the world 
outside. Eight of the wires are used for data and two for controlling the 
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data. Any one of the data wires can be programmed to behave as either an 
input or an output, which obviously adds to the versatility. It is worth 
examining some of the possibilities: 


(1) Assuming a device could be switched ON or OFF by simple logic, we 
could independently control eight of them. They would all be programmed 
initially as outputs. 

(2) Any one of eight devices could send a logic signal back to the computer. 
They would all require initial programming as inputs. 

(3) Three devices could be controlled by the computer and five could send 
signals back. 

(4) By employing simple decoding chips, it would be possible to control any 
one of 256 output devices. Conversely, any one of 256 devices could senda 
signal to the computer. 

(5) Two devices each requiring four inputs could be driven in parallel. 


With regard to the two control wires mentioned above, one is always an 
input but the other can be programmed as an input or output. The input 
control wire can be used to initiate an interrupt sequence. That is to say, an 
input signal can cause the present program to be interrupted and a jump 
made to an entirely different program. (Interrupts are discussed in Chapter 
2.) The use of interrupts can be a hazardous exercise because the entire 
operating system is already controlled by interrupts. For example, when 
the computer is switched on and a flashing cursor is displayed, the machine 
appears docile and in a resting state. This is far from true. In fact there is a 
furious battle going on in the operating system. Every few milliseconds, the 
display system rudely interrupts whatever is going on (most probably 
waiting for a key to be pressed) to repaint the screen picture. The screen 
display only appears stationary to the eye because every single dot, which 
makes up the ‘picture’, is repainted many times per second. Due to the 
persistence of normal vision, a screen picture need only be repainted every 
few milliseconds so there is ample time in between for the computer to carry 
out your orders in a series of interrupted instalments. 


Interrupt requests 


If the order involves the use of a peripheral, such as a cassette tape recorder 
or a printer, it is probable that the operating system will handle them by yet 
another interrupt sequence. The question of relative priorities then arises. 
For example, what should happen if the printer sends a signal to interrupt 
while the system is already in an interrupt condition, such as repainting the 
screen? 

In real life, the needs of certain people have priority claims over the needs 
of others in order to maintain a well-ordered, stable society. A similar 
arrarigement has been found to work well in computing systems. Certain 
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interrupts must be considered to have higher precedence than others and 
permission to interrupt must be granted first rather than allowing a 
disorganised free-for-all. The 6502 microprocessor is fitted with interrupt- 
request logic which, in conjunction with appropriate software, can 
maintain tranquillity. When an interrupt request signal from a peripheral 
appears on a control input line, it is passed, via the VIA, to the 
microprocessor. A certain bit in the processor (called the ‘interrupt mask 
bit’) is examined. If the mask bit is a ‘0’, the interrupt is allowed. If, 
however, the bit is in the ‘I’ state the request is noted but activation is 
delayed until such time as the mask bit is reset to zero. The actual setting or 
resetting of the mask bit is, of course, the responsibility of the programmer. 
We shall see later that the 6502 microprocessor has two instructions for 
defining the state of the mask bit. To set the mask to ‘1’, the instruction is 
SEI (SEt I bit). To reset the mask to ‘0’, the instruction is CLI (CLear I bit). 
The mask is referred to as the ‘I bit’ in 6502 language. 


Serial and parallel interfaces 


Most computers, including the BBC model, use the ASCII code to 
represent characters. Each character requires one byte (a byte is eight 
binary bits). There are two ways of sending these characters along wires to, 
say, a printer some distance away: 


(a) Parallel transmission, in which eight wires are used to send every bit of 
the code simultaneously. 

(b) Serial transmission, in which the eight bits are sent, one behind the 
other, along a single wire. 


It is worth examining the relative merits of the two systems because the 
BBC machine provides provisions for both serial and parallel feeding of 
peripherals. Superficially, it would appear that parallel transmission would 
be much faster, eight times faster than serial. This is true but in most cases, 
particularly where printers are concerned, the difference is seldom of any 
interest because the weakest link in the chain is the printer rather than the 
transmission delay along the wires. Thus, if we run the same printer by 
serial transmission and then change to parallel (assuming the printer allows 
either option) we will find no difference in printing speed. The cogs and 
levers still take up more than 99% of the printing time. Traditionally, serial 
transmission dominated the scene, particularly for feeding the now almost 
obsolete Teletype, a two-way device which combined a sending keyboard, a 
paper tape reader/punch and a printer. It was Centronics, a company 
which specialises in printers, which contributed to the popularity of the 
parallel transmission. In fact, their design for allocating the various control 
functions has been copied and virtually standardised by what is known as 
the Centronics interface. The BBC has a parallel Centronics interface 
although it is simply labelled above the socket as the ‘Printer’. 
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The RS423 interface 

The BBC Micro also has provisions for a serial interface via the socket 
labelled ‘RS423’ which now deserves some explanation. As stated above, the 
Teletype was the standard peripheral in earlier days and the serial interface 
was known as the ‘20mA current loop’ because it demanded a current of 20 
milliamps from the computer to drive the printer relays. This was in the 
days when computers were massive things and 20 milliamps was a 
negligible drain on the relatively enormous power supplies of the time. 
When logic chips arrived on the scene, operating on almost negligible 
current, 20 milliamps just to drive the printer started to become almost 
absurd. One of the results was the formation of a new serial ‘standard’ 
called the RS232 interface which demanded only a small current drive. In 
fact it would be better to consider it as a voltage rather than a current- 
driven system. The RS423 interface is almost identical (in fact compatible) 
with the RS232 but in many respects it is superior. There is always an upper 
limit on the length of wires between the sending and receiving end of a serial 
transmission line. The RS423 allows a longer length than the RS232. 

Although the printer has featured in the above discussion on the RS423, 
it is a general purpose interrupt driven interface, capable of linking any 
device which requires a serial interface. For example, it can be used to 
connect two BBC machines together so they can mutually converse with 
each other. Figure 1.1 shows the interconnections required between the two 
sockets. In addition to the wiring shown, small software routine is 
necessary before the two machines can talk to each other. 

The serial nature of RS423 data requires some rather complex logic 
operations. One difficulty, inherent in all serial transmissions, is the 
method of establishing some form of marker signal to indicate the three 
distinct parts of the message — for example, the quiescent state in between 
characters, the start of a character and its end. This is handled by the ‘start’ 
and ‘stop’ bits at either end of the 8-bit character string. For some time now, 
manufacturers of microprocessors have always supplied ready-made 


RS423 RS423 
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Fig. 1.1. Connecting two BBC machines together. 
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solutions to such problems in the shape of specialised chips. The common 
name for the class of chip which handles the serial transmission of data is 
‘Asynchronous Serial Interface Adaptor (ASIA for short). The particular 
species in the BBC machine is the 6850. This chip converts the 8-bit parallel 
output from the computer data bus into serial form. It also generates the 
start and stop bits, together with the correct timing circuitry. The 685C is to 
serial interfacing as the 6522 is to parallel; that is to say, the ASIA and the 
VIA are serial and parallel interfaces respectively. 


Analogue to digital conversion 


Digital computers are designed to operate in a restricted, but nevertheless 
predictable, swo-staie environment. Voltages are either in the ‘1’ state 
(about +5 volts) or in the ‘0’ state (about 0 volts). Conditions in the world 
outside have no such restrictions. Physical variables, such as wind, 
temperature, pressure, electrical voltages etc., can assume a wide range of 
values. A voltage which somehow is made proportional to a particular 
physical quantity is called a voltage analogue of that quantity. Thus, if a 
wind velocity over the range 0 to 100 miles per hour were represented by a 
voltage between 0 and 10, the scaling factor would be 10 miles per volt. The 
particular gadget which converted the wind speed to voltage would be 
termed a transducer and, in its most simple form, could be an electrical 
generator with the shaft driven by wind blades. Linearity of the scale would 
obviously depend on a strictly proportional relationship between shaft 
speed and output voltage. However, a program could easily be written to 
account for non-linearity in any transducer. 

The BBC machine has a special analogue to digital input socket marked 
‘Analogue In’. It enables any one of four analogue input channels to be 
converted to a digital number. Unfortunately, this facility, if some of the 
popular computing magazines are to be believed, appears to be used almost 
entirely for waggling games paddles. This is a pity because the interface is 
suitable for a wide range of applications. Figure 1.2 shows how to wire upa 
simple circuit in order to experiment with the converter. 

The top diagram, Fig. 1.2(a), shows the connections between a 10K 
potentiometer (known in electronic circles as a ‘pot’) and the 15-pin 
Analogue-In socket at the rear of the machine. It enables any voltage 
between 0 volts and a nominal 1.8 volts to be applied to the analogue-to- 
digital input by twiddling the knob on the pot. A word of warning is not 
out of place here. The wiring from the pot should never be soldered 
directly to the pins on the computer socket. Always use a plug and 
socket connection. There are many firms that supply the correct socket 
already attached to a ribbon cable. Soldering direct to the machine’s 
Analogue-In socket is crude and unprofessional. In fact this warning 
applies to all external connections because there is always a danger of the 
leads shorting together by wisps of solder. The output from the pot is 
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(a) Analogue in socket 
O05 oO4 O33 O2 O01 










015 O14 O13; O12 O11 O10 O89 





10K Linear pot 
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+ 5 volts 


1.8 volts 





ay ae ee Analogue ground 


Fig. 71.2. Analogue to digital hook-up. 


connected to pin 15 whichis Channel 0 although there is no reason why any 
of the other three possible channels could not be used. 

Figure 1.2(b) is for the benefit of those who feel happier if they know 
what they are doing. The circuitry to the left of the dotted line is within the 
computer. The 1.8 voltage reference line is obtained internally from a 
divider chain across the +5 volt supply to ground. The three silicon diodes 
are in series with each other, providing a total forward voltage drop of 
about 1.8 volts. Silicon diodes have the useful property of dropping about 
0.6 volts when in forward conduction and within reason, irrespective of the 
current through them. However, it would be unreasonable to expect 
exactly 1.8 volts output to the pot. 

The wiring can be tested out with a simple few lines in BASIC as 
described under the keyword ADVAL on pages 202-204 in the User Guide. 
(Machine code versions appear later.) 
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Using the four channels 

The chip performing the analogue conversion to digital is an interrupt- 
driven D7002: a standard, but nevertheless sophisticated, component. It 
converts the analogue input voltage into a 12-bit binary number so each 
additional bit increases the number by an increment of 1/4096. It allows 
any one of four input voltage channels to be measured. Figure 1.3 shows 
how to experiment with four controlling pots. 

The four pots are in parallel across the reference supply (pin 11 and 5) 
with sliders connected to the channel input, pins 15, 7, 12 and 4. If 10K pots 
are used, the total load of 2.5K is a little on the heavy side. The current 
limiting resistance, R71, shown previously in Fig. 1.2, causes the voltage to 


Channel 1 Channel 2 Channel 3 Channel 4 





Fig. 71.3 Four-channel control. 


the stabilising diodes to be pulled down to a rather low value of 2.5 volts. 
The next highest preferred value of pot would be a reasonable compromise. 
The four controls can be tested immediately by the following simple 
BASIC lines: 


10 MODE 7:Channel=1 

2O REPEAT 

30 Anal ogue=ADVAL (Channel ) 

40 PRINT TAB(S, 10+Channel)"Channel nu 
mber ";Channel , Analogue 

YO K=INKEY (100) 

60 Channel=Channel+i 

70 IF Channel >4 THEN Channel=i 

80 UNTIL 2=3 — 


The program will continuously display all four channel readings in the 
form of 5-digit decimal numbers, the readings changing as the pots are 
varied. Line 50 is to slow up the changes to prevent blurring. It is interesting 
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to try out the above even if you have not wired anything to the Analogue-In 
socket. The display will show four numbers, varying in the region of 50000. 
The reason for this is the open-circuit condition of the input channels 
causing an indeterminate ‘floating-to-high’ state. As the User Guide 
explains (page 202), the internal operating system has allowed for possible 
replacement of the D7002 by a higher resolution version, so the digitised 
range appears as 0 to 65520 instead of 0 to 4095. Asa result, the readings go 
up in increments of 16 rather than I. 


Floppy disk controller 


Floppy disks pose a greater interfacing problem to both the hardware and 
software engineer than the relatively simple cassette tape backing store. A 
special disk controller chip, the 8271, is responsible for the primary 
hardware interface. The repertoire of extra commands required to make 
full use of the disk filing system is buried in a special ROM, known as the 
DOS (Disk Operating System) or the DFS (Disk Filing System). The 
necessary power to drive the disk motor and the disk electronics is supplied 
from a 40-pin socket beneath the keyboard. 


Memory mapping and page numbers 

The peripheral devices so far discussed are standard to the model B, with 
the exception of the disk interface components which are optional extras. 
All peripherals are memory-mapped, meaning they are all accessed as if 
they were normal memory locations rather than responding to special 
machine code instructions. Thus, the floppy disk controller chip ‘resides’ at 
the five memory locations &FE80, &FE81, &FE82, &FE83 and & FE84. 
The ‘&’ prefix indicates that the numbers are in hexadecimal rather than 
decimal. The hexadecimal (hex for short) counting system is widely used in 
machine code work, particularly when referring to machine addresses. If 
hex is unfamiliar to you, see page 71 of the User Guide or skip to Appendix 
A of this book. A complete hexadecimal address consists of four hex digits. 
The first two digits are best thought of as the page number and the second 
two digits as the position on the page. Referring back to the five addresses 
occupied by the floppy disk controller, we could say they are all on page 
&FE of the memory map. 


Sheila addresses 


The machine addresses for all the peripheral interface chips so far discussed 
are on page &FE. This particular page has been given the rather 
charming name of Sheila. Thus the Sheila address band is between & FEO0 
to &FEFF. The positions on the page are allocated as follows: 
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6845 CRT controller: &FE00 to & FEO! 
In BASIC, this is accessed by the VDU 23 command. For example, VDU 
23; R, V: 0; 0; 0 places the value V into register R of the CRT controller. 


6845 ASIA: & FE08 to & FEO9 
This is where the parallel/serial conversion is carried out when accessing 
the cassette tape or the RS423. 


Serial ULA: &FEI0O 

ULA stands for ‘Uncommitted Logic Array’ which would imply it can be 
used for anything! It is one of the new breed of ‘miracle’ chips, containing 
the groundwork necessary to build a logic system of any desired form. In 
the initial stages of manufacture, it is rightly named an uncommitted logic 
array. However, before it leaves the factory the customer supplies further 
information which turns the previously uncommitted array into a 
conglomerate of committed functions. Thus, although the name ULA 
sticks, those in the BBC machine are certainly committed arrays. This 
particular ULA helps in the organisation of serial peripheral devices. 


Video ULA: & FE20 to & FE21 
The logic buried in this chip is responsible for much of the superb graphic 
and colour facilities available. 


Paged ROM controller : &FE30 
This is a simple decoder chip (74LS161) used to switch over the paged 
ROMs referred to earlier. 


The internal VIA : &FE40 to & FE4F 

There are two 6522 VIAs in the machine. This one is designated ‘internal’ 
because it is used for several purposes, indirectly concerned with the 
control of input and output. 


The external VIA : & FE60 to &FE6F 

This 6522 is dedicated to the parallel printer interface (Centronics) and the 
user port. The 6522 VIA consists of two, almost identical, halves. The ‘A’ 
side is committed to the printer and the ‘B’ side to the user port. As can be 
seen from the address range, the 6522 requires sixteen machine addresses in 
order to make full use of it. One function carried out on the ‘A’ side is the 
interrupt-driven clock used in the TIME keyword when in BASIC. The 
timers on the ‘B’ side are available for users’ programs. 


The floppy disk controller : &FE80 to & FE8&4 
This is the home of the 8271 chip mentioned earlier. 


Data link controller : &FEA0 to &FEA3 

This is an advanced chip containing much of the logic required to control 
the Econet communications interface. It is part of the optional extras 
offered. 
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Analogue to digital converter : &FECO to &FEC2 
This is the D7002 chip previously discussed. 


At this point, it is worth repeating the earlier warning regarding the 
dangers of directly addressing any of these machine locations. However, 
the warning was in the nature of advice rather than a rule. There is nothing 
illegal in direct accessing the peripheral devices although, apart from the 
user port, it is unlikely that you will ever find the need to circumvent the 
resident operating system subroutines. The above information and 
addresses were primarily intended as background information. However, 
there will always be a few of the more intrepid readers who may feel a desire 
to ‘improve’ on the operating system, even at the risk of crashing or 
jeopardising the smooth running of a future second-processor. 


Fred addresses and the 1 MHz Bus 


Sheila addresses are concerned with what may be broadly described as 
standard peripherals. The BBC machine, however, caters for more 
ambitious schemes, such as Teletext, Prestel, dealer’s test kits, etc. There is 
a multipin socket beneath the machine known as the ‘| MHz Bus’ which 
caters for these optional additions to the system. These are allotted 
addresses in the band & FC00 to & FCFF. This band (page & FC) is named 
Fred, a less charming name than Sheila but still quite novel. The individual 
allocations within Fred are as follows: 


Dealer’s test equipment : & FC00 to & FCOF 

Testing and fault diagnosis of the BBC machine is rendered easier for 
dealers by a special box of tricks known as PET (no connection with 
Commodore’s famous family of micros). PET stands for Progressive 
Establishment Tester and occupies sixteen locations within Fred’s band. 


Teletext : &FC10 to &FCI3 

Most readers will know that a Teletext adapter can be bought, enabling a 
range of free software to be down-loaded directly into the RAM locations 
from BBC transmissions. It is also possible to call up and incorporate any 
of the standard Teletext pages in your own programs. An updated list of 
the software available currently appears on page 701 of Teletext. 


Prestel : &FC14 to &FCIF 

Prestel is the Rolls Royce version of Teletext, providing access to an 
enormous, and still growing, storehouse of information. The information 
comes via the normal telephone system so it requires a modem (modulator 
and demodulator). A modem provides the necessary conversion from 
computer logic signals to audio tones, suitable for passing down a 
telephone line. The coupling is acoustical, rather than wired, in order to 
circumvent the rules relating to unauthorised interference with telephone 
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equipment. Prestel is two-way in action, meaning you can also send your 
own messages back to Prestel. There is one snag of course- you have to pay for 
telephone calls and also an extra charge on top for certain of the Prestel pages. 


Test equipment : &FC80 to &FC8F 
Some more allocations for testing purposes. 


Reserved for user’s applications : &FCC0 to & FCFE 


There is an ever-growing range of hardware available designed to plug into 
the | MHz bus. As can be seen from above, a generous range of addresses 
are left vacant for the purpose, sixty-three in fact. 


ROM paging register : &FCFF 


This is a single byte address used for paging different ROMs. (See JIM 
address band.) 


JIM addresses and the 1 MHz bus 


The address band &FD00 to &FDFF is called JIM and is primarily 
designed for connecting up alternative memory chips (up to 64K of RAM 
or ROM) via the | MHz bus. Note that the word ‘alternative’ is used rather 
than ‘extra’ because the full 64K can only be page-accessed in competition 
with the resident 64K. Connecting up the memory Is not a task you should 
attempt unless you feel confident or, preferably, have gained some 
experience with decoding logic (some general information on logic can be 
found in Appendix A). In the meantime, it is worth mentioning that the 
solitary address & FCFF, called the ‘paging register’ in the Fred band, will 
be found to have important significance to the decoding network. 


Wiring details of the | MHz bus 


Page 503 of the User Guide gives a diagram of the connections to the | 
M Hz bus. However, a scaled down version is shown in Fig. 1.4 which tends 
to emphasise the decoding problem. 

There are several abbreviations in the diagram which may demand 
translation. One of the more depressing features of computer hardware is 
the proliferation of abbreviations which designers use. To the designer, the 
letter groups may be ‘plain English’ because of familiarisation. To the non- 
expert, they present a fearsome obstacle to progress, made worse bythe lack 
of standardisation. 

The | MHz socket pins are labelled the same as on page 503 of the User 
Guide but make no attempt to show the correct orientation of the 
corresponding pin numbers. You must refer to the original diagram in the 
User Guide if you intend to use the bus. The position of the 6502 
microprocessor has been included in order to gain an overall perspective of 
the bus. Discussion of the wiring can be treated under three headings, the 
address bus, the data bus and finally the control lines. 
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The address bus 

There are 16 wires on the address bus of the 6502 microprocessor, labelled 
AO to Al5. Only the 8 lower order wires, however, AO to A7 are brought 
out to the | MHz bus. In order to provide the bus with high current drive, a 
74LS244 non-inverting buffer (IC71) is used. The chip has no logic 
function but the wires are electrically stronger when they emerge. The 
address bus is one-way only. That is to say, signals can only pass from the 
microprocessor. 


The data bus 
The 8 wires, DO to D7, from the microprocessor pass the 1 MHz bus via a 


two-way non-inverting buffer (1C72). This is a 74LS245, a more 
complicated buffer than the one used on the address bus because it must 
allow signals to pass to and from the microprocessor. Obviously, it cannot 
pass signals both ways at once So it must be controlled to either READ 
(pass data to the microprocessor) or WRITE (pass data from the 
microprocessor). The control terminal on the chip is labelled T/R which 
means “Transmit/ Receive’. Why wasn’t it labelled R/ W to make it tie up 
with READ and WRITE? The answer is due to the general purpose nature 
of logic chip design. A bidirectional buffer could have many uses apart 
from reading and writing to and from memory so it uses the wider terms 
‘transmit/ receive’. Another control on the buffer is labelled CE which 
stands for ‘Chip Enable’. This allows the entire chip to be switched 
ON/OFF or, using established jargon, enabled or disabled. Thus, the 
buffer can be placed in any one of three states, the READ state, the WRITE 
state or completely OFF altogether. When in the OFF state, the 
microprocessor data bus is completely unaware of its existence. It is said to 
be ‘floating’. 


The control lines 

The address and data buses are well-defined entities. By comparison, the 
control lines of any computer system always seem an unruly mess. Odd- 
looking single lines carrying funny labels appear to wander around 
different parts of the microcomputer in an undisciplined manner. The sheer 
complexity of the complete microcomputer wiring means that only parts of 
it can be shown at once. This means that some of the control lines appear to 
end abruptly, only to appear again on another diagram, apparently starting 
from nowhere! To assist bewilderment, some of the address wires may be 
used to ‘control’ and data wires may even be used for addressing purposes. 
There are no short-cuts to understanding, it’s merely a case of methodical 
plodding through each control and gradually becoming accustomed to the 
strange-looking abbreviations. Referring to Fig. 1.4 again and dealing with 
the easier parts first: 


The power lines 
The bus supplies a +5V and a OV line for your external use although it is 
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not to be treated as an inexhaustible source of power. However, it should 
be ample for running extra memory and/ ora reasonable collection of logic 
chips. 


Reset line (labelled NRST) 

It isimportant that all parts of the computer and any equipment you mayadd 
on to the | MHz bus begin in step with each other. The microprocessor 
sends out a master reset signal which, in 6502 data sheets, is labelled RST. 
The bar over the top is a well-standardised method of indicating reverse 
logic, the bar standing for the word ‘not’, indicating that reset action will 
occur on a logic 0 rather than a logic | (for further details on logic, see 
Appendix A). For some obscure reason, the | MHz bus uses the prefix ‘N’ 
instead of the bar, to indicate reverse logic. So the line which starts off from 
the microprocessor as RST, emerges out of the bus as NRST (the N 
meaning ‘not’). 


Read/ write control (labelled R/ NW) 

The 6502 microprocessor labels the line R/ W. This indicates that a logic 1 
on the line cause a READ and a logic 0 causes a WRITE. Note it is used to 
control the T/R terminal of the two-way buffer as well as providing an 
external line to the | MHz bus. 


The | MHz clock (labelled 1 MHzE) 

The source of the oscillator, known as the ‘clock’, is not shown. It may be 
required for external projects which require timing synchronisation with 
the resident clock system. It is worth mentioning at this point that the 
resident ‘master clock’ is 2 MHz, so external projects run at half speed. This 
is not a bad thing because the lower the frequency, the less critical you need 
be on the choice of components. 


Analogue in 

This allows an additional signal to be picked up and fed to the audio 
circuits of the computer. Any signal here is superimposed on any other 
audio signals. It is not an extra channel. The rms signal voltage must not 
exceed 3 millivolts or distortion and possibly damage may occur. 


Interrupt request (labelled NIRQ) 

This caters for external projects which rely on interrupt procedures, such as 
an extra 6522 VIA, or perhaps the more simple 6520 PIA. It enters the 
microprocessor under the label IRQ which means Interrupt ReQuest. As 
the bar over the top shows, it recognises only a logic 0. Although not shown 
on the diagram, there are several other contenders for interrupt. Because 
there is only a single line available, the various interrupt inputs are 
normally wire-ored (see Appendix A). 


Non-maskable interrupt (labelled as NNMI) 
This is brought out for external use but using it is fraught with danger of a 
crash. -A signal on this line could endanger the operating system which 
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expects to have exclusive rights to it. You are strongly advised to forget that 
it even exists! 


Fred and JIM (labelled as NPGFC and NPGFD respectively) 

NPGFD stands for Not PaGE FD, indicating that it is logic 0 active. 
Earlier discussions on FRED and JIM described them as particular pages 
(&FC and & FD) in memory. It now seems odd that single wires should 
have similar names. The answer is that the data lines DO to D7 on the 
| MHz bus must only be activated when either of these pages are addressed 
~ that is to say, only when the higher order pair of the hex digit addresses are 
FC or FD. Figure 1.4 gives no indication of the source of the FRED and 
JIM wires, but it is evident that they must somehow be fed from a decoder 
which senses when the higher order address lines (A8 to A15) carry the 
binary code corresponding to & FC or & FD. Full technical drawings of the 
BBC machine are difficult to obtain, so Fig. 1.5 is offered as one way the 
decoding might have been arranged. 
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Fig. 1.5. Decoder for producing Fred and Jim. 


The decoding uses NAND gates which deliver a logic 0 only when all 
inputs are at logic 1. The binary pattern for & FC is 1111 1100 so two logic 
inverters are necessary from lines A8 and A9. The binary pattern for &FD 
is 1111 1101 so only one logic inverter is required from address line A9. 

Referring back to Fig. 1.4, the Jim and Fred lines are fed, via two diodes, 
to the chip enable (CE) pin on the two-way data buffer. Note that this pin 
has a tiny ‘bubble’ at the input which is a standard symbol indicating that 
reverse logic is required. (This explains why a NAND instead of an AND 
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gate was used in Fig. 1.5.) Thus, the data bus is only routed to the 1 MHz 
bus when the chip is enabled by prior selection of Fred or Jim. 

A few words of explanation may be needed with regard to the two 
diodes. In conjunction with the 3.3K resistor (R108), they form an OR 
gate for reverse logic. The register normally pulls CE high to +5V, 
disabling the buffer chip. When a logic 0 from either Fred or Jim appears at 
the cathodes, the appropriate diode conducts and pulls the CE pin down 
towards 0 volts which enables it. 


Suggestions for 1 MHz bus projects 


The previous explanation of the bus will be interesting only to those who 
use, or have a yearning to construct their own extension systems. It is 
surprising how newcomers to machine code often become more interested 
in hardware projects even if they have previously showed little curiosity. 
This is because machine code programming probably creates more interest 
in computing theory and logic systems than BASIC or most high level 
languages. The 1 MHz bus is beautifully designed for expanding the 
capabilities of the machine and experimenting. It would bea shame if your 
socket is always left vacant and collecting dust. Assuming you have never 
before tackled logic design or construction, the following advice may get 
you started: 


(1) Obtain a copy of Application Note 1: The 1 MHz Bus from Acorn. This 
gives a well detailed description of the bus although you may find it a little 
heavy-going in parts. 

(2) Scan through magazines which cater for both hardware and software. 
Electronics and Computing is a good example, and it is usually refreshingly 
free from articles by games fanatics. 

(3) Study Appendix A of this book and supplement it with one of the many 
books available on logic theory. Ian Sinclair’s Beginner’s Guide to Digital 
Electronics is excellent. 

(4) Get a friend, or become friendly with someone, who is knowledgeable in 
the field of electronics. There are usually one or two of these types lurking 
around your local computer club. However clever you become at logic 
design, you may require some expertise in electrical matters, if only to 
avoid damage to the equipment. Soldering plugs, sockets and IC pins is not 
quite the simple exercise we are sometimes led to believe. 


Almost any device can be switched ON or OFF by computer instructions 
providing it is controlled by a logic | ora logic 0. The device can bea simple 
lamp, the jib or winding gear of a crane, the points on a model railway or 
even one of the many chips available in the ‘74 series described in 
Appendix A. From now on, it will be assumed that the term ‘device’ is /ogic 
controlled so it is unnecessary to burden the mind with its particular 
function. 
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Address decoding 


If there are several different devices, each one must have an associated code 
acting as anidentifier. This code will be the device ‘address’. The | MHz bus 
is equipped with address wires AO to A7 whichcan be tapped for supplying 
the two /east significant hex digits of the address. The two most significant 
hex digits have already been decided for us since the | MHz bus is restricted 
to page FC for device control. As explained previously, the line marked 
NPGFC is available on the bus to enforce this page. There are 256 different 
address codes possible on one page according to the binary pattern set up 
on AO to A7. That is to say, the address range of our page FC extends in 
theory from & FC00 to &FCFF. However, we should restrict the coding of 
our special devices to the small band set aside for User Applications, 
&FCCO to &FCFE. It is comforting to know that decoding AO to A7 1s 
easily achieved by special decoder chips available in the 74 series. These 
decoders are general purpose but skilfully designed for slipping neatly into 
a variety of systems. Figure 1.6 shows a possible hook up for driving any 
one of eight devices. 


VYA7 A5 A3 At 
A6 | A4| A2| AO NPGFC OV +5V 


FCFO 
FCF1 
FCF2 
FCF3 
FCF4 
FCF5 
FCF6 
FCF7 





Fig. 1.6. Selecting any one of eight devices. 


The 74LS138 is a well-known decoder chip. It has eight individual 
Output lines, only one of which can be ‘activated’ at any one time. The 
output chosen to be active depends on the binary pattern applied to the 
three select lines. However, whatever the select pattern applied, the chip 
will still not be functional until all three enab/e lines are correctly driven. 
The combination of selection and enabling can be used to provide unique 
address decoding for each of the outputs. Working out address decoding is 





24 Advanced Machine Code Techniques for The BBC Micro 


an art which is essential for those who intend to design their own add-on 
boards, so the following step-by-step explanation of Fig. 1.6 is worth 
studying carefully. The explanation is limited to one of the outputs (pin 15) 
labelled as address FCFO: 


@ The line NPGFC, already supplied from the 1 MHz bus, is passed 
directly to pin 5, pulling it down to 0 V and, because this pin has a ‘bubble’, 
is correctly enabled. Thus, apart from anything else, the chip only comes 
alive when the first two hex digits of the address are FC (Fred in other 
words). 

@ The address lines A4, A5, A6and A7 provide the inputs toa NAND gate. 
This only gives logic 0 out if all four inputs are logic 1. But pin 6 requires 
a logic | to enable it (no bubble at input) so the NAND gate output 
is inverted in order to correctly enable pin 6. So we conclude that the 
third hex digit of the address must be F (because 1111 is F) to ensure that 
pin 6 is enabled. We have now established the first three hex digits of the 
address FCFO. 

@ Ifthe last hex digit is 0, the four address wires AO,A1,A2 and A3 must all 
be logic 0. Note that A3 is connected to pin 4 which is a bubble input and 
enabled only by a logic 0. The remaining three address lines are connected to 
the select pins. The internal design of the 74LS138 is such that logic 0 on all 
three select pins activitates only pin 15 and is therefore correctly labelled FCFO. 


Thus, to switch ON any device connected to pin 15 of the decoder chip, all 
we have to do is to specify address &FCFO. 

If all this appears difficult, it may help if you-pencilled in the binary 
pattern for FO on the eight address lines as follows: 


A7 A6 AS A4 A3 A2 AI AO 
I I I | 4.482 * Gia Ge 


Tracing the ‘l’s and ‘0’s back to the decoding chip will confirm that the chip 
is fully enabled (with pin 15 activated) only by the address & FCFO. 
What about the other seven outputs? The only difference will be the 
pattern on the three select lines driven from AO, Al and A2. The internal 
design of the decoder chip ensures the following behaviour: | 





A2 Al AO _ Output selected (hex address) 


0 0 0&2 pin ld FCFO0 
0 O 1 pin 14 FCFI 
0 ft § .om is FCF2 
G4 1 pin 12 PCF3 
| UO «0 pin. tl FCF4 
; ~~ 1 ap 19 FCF5 
| 1 0 pin 9 rors 
I I 1 pia? PCF] 
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Although Fig. 1.6 has been built around the addresses & FCF0O to & FCF7, 
we could have chosen any other set of eight contiguous addresses in Fred’s 
user application band (&FCC0O to &FCFE). For example, suppose for 
some reason we wished to position the decoder in the eight addresses 
&FCEO to &FCE7. The third hex digit would now be E (binary 1110) 
instead of F (binary 1111). The only change, therefore, would bea logic0O on 
address line A4 instead of a logic 1. Thus the circuit remains the same 
except for one extra inverter in between A4 and the NAND gate. 

Note from Fig. 1.6 that the data bus is not involved. Switching on the 
selected advice is simply a case of mentioning the address in a machine code 
instruction; any data transfers by READ or WRITE action are incidental 
to device action. We could say that the device is ‘address’ rather than ‘data’ 
driven. 


Sending parallel data externally 


Some devices only require a single logic line to activate them (simple lamp 
or electric motor). Others, may require a number of data lines as, for 
example, an 8-bit digital to analogue converter. Driving such devices 
would involve the data bus as well as the address bus. As far as address 


pe oe ee me oS — = = \ 
y 07 DS D3 D1 +5V OV y 
D6 | D4 | D2 | DO 







Parallel data out yo> 





Any decoded line 
(example FCFO) 


> s Cy Clock Enable 







8 bit latch OV 





Parallel data out 


Fig. 1.7. An 8-bit latch as an example of a device. 
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decoding is concerned, we could enable the device from any one of the 
outputs in Fig. 1.6 or, in fact, up to eight separate ones if the necessity 
arises. Figure 1.7 shows how the device-select action is combined with the 
data bus. 

An 8-bit latch is a kind of buffer between the data source (in this case the 
data bus) and the device which requires it. The device only accepts the data 
when a terminal called the clock changes state. Because the clock input in 
Fig. 1.7 has a bubble, clocking the data takes place when the change is from 
logic | to logic 0. In the quiescent state (device not yet selected) the parallel 
data out is that which remained from the last time the latch was clocked: it 
is ‘historic’ data. However, whenever & FCFO is addressed (goes from logic 
0 to logic 1), the inverter in the line causes the clock to go from logic | to 
logic 0. The information on the data bus then passes through to the parallel 
output lines. To summarise: when &FCFO is addressed, the device 
immediately receives new information from the data bus, overwriting the 
previous information. As a matter of interest, the machine code line to 
place the contents of the accumulator into the output lines is: 


STA &FCFO 


The system recognises the instruction is to act on the | MHz bus devices 
because the first two hex digits of the address is the Fred page. 


Pilar... 1MHzbus — eas” 
y 07 D5 D3 D1 +5V OVy 
D6 | D4 | D2 | DO 


FCF2 
(or any other) 
Enable 


Digital to analogue converter 





Analogue out 


Fig. 1.8. Converting digital information to analogue output. 
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Figure 1.8 is another example of a device requiring the data bus, which is 
needed, of course, for receiving the digital data. ) 

Although the BBC machine is not equipped with a digital to analogue 
converter, it is easy to hook one on the | MHz bus. Fhere are several types 
available so the diagram contains only sufficient detail for the under- 
standing of the system. The diagram shows the data bus directly connected 
to the D/ A converter and enabled by oné of the decdded addresses. This, of 
course, could be one of the lines from/a 74LS138 (as shown earlier in Fig. 
1.6). An instruction to produce analogue output could be: 


SA aPCr2 







This will cause the output voltage to assume a value proportional to the 
digital number in the accumulator. 


Using the R/NW line 


Up to this point, the devices described have all delivered outputs, from the 
computer to the outside world. However, many systems may include one or 
more input devices passing data from the outside world to the computer. 
One obvious example is an analogue to digital converter. There are two 
dangers inherent in the design of input devices. The first is obvious: make 
sure that the voltages coming from the outside world are within the allowed 
maximum (usually within the range 0 V to 5 V). The second danger is more 
obscure. If an input device is enabled, it normally places data on the 
address bus. Thus the instruction should READ from the data bus (an 
LDA for example). But if, by mistake, we WRITE (an STA, for example) 
there is a conflict of loyalties on the data bus because two independent sets 
of data are trying to gain control. This could damage some of the circuits. 
The remedy is either to become an infallible programmer (we have never 
yet met one) or incorporate something on the lines shown in Fig. 1.9. 

In Fig. 1.9,a NAND gate is fed by the R/ NW line and the decoded device 
address. The output from the gate is logic 0 only if both inputs are logic 1. 
Therefore, even if the device address is correct (FCF4 in the example) the 
R/NW line must be logic | which is present only if the instruction is to 
READ. If, in error, the programmer causes a WRITE action, the R/ NW 
line is turned to logic 0 and the NAND output rises to logic 1. Because the 
device in Fig. 1.9 has a bubble at the enable input, the input device is not 
enabled. Thus, it is now impossible to jam the data bus by trying to write to 
an input device. If the device had no enable bubble (and some of them 
haven’t), it would be necessary to include an inverter in the NAND output 
line. This latter point illustrates how easy it is to be ‘one inversion out’ when 
designing logic systems. You can never be just a little bit out (as in analogue 
design). It is either dead right or dead wrong! 
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LJ Ui Enable 


FCF4 
(or any other) 


Any input device 
Fig. 1.9. Gating the READ/WRITE line to input devices. 


Using Jim to access auxiliary memory 


Fred is used for attaching a variety of extra gadgets to the | MHz bus. Jim 
is virtually dedicated to the sole purpose of adding on auxiliary blocks of 
memory. Up to the full 64K of RAM, ROM or mixtures of RAM and 
ROM can be page accessed. For full details, it is advisable to obtain the 
Acorn leaflet on the 1 MHz bus but, in the meantime, Fig. 1.10 shows the 
general idea. 

As mentioned previously, a sixteen-wire bus is required to cover a 
64K memory system. Only eight-address wires, AO to A7, are provided on 
the 1 MHz bus so it is not surprising that the solution to such a problem 
involves some strange goings-on. Examining Fig. 1.10 reveals that the /ower 
order addresses are supplied by the legitimate address bus. The higher 
order address, A8 to A15 is obtained by ‘borrowing’ the data bus. How can 
the data bus (which is required to pass data to an address) simultaneously 
supply part of the address as well? The simple answer is that it can’t. The 
secret lies in the 8-bit latch. The programmer first supplies the higher order 
address (the page). This is then automatically clocked into the latch, 
supplying A8 to AI5 in the ROM/RAM. The figure shows that latch 
clocking is achieved by the special decoded line FCFF. It also shows that 
the entire ROM/RAM memory structure is only enabled by using the Jim 
address band, detected by the NPGFD line on the | MHz bus. Although 
the enabling pin is marked ‘valid memory address’ (instead of ‘enable’), the 
effect is identical. 

There is naturally a small penalty to pay for all this jiggery pokery. It 
takes a little longer to read or write to a location in the extended memory, 
particularly if the desired address is outside the current page (outside the 
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pont nreoror rer eC eo Mea ie ~N 
Y 
A7 AO D7 DO RINW 








8 bit latch 


valid piw 
address 
64K 
ROM/RAM 
A15 





ey 





Fig. 1.10. Outline scheme for 64K of paged ROM/RAM. 


‘page boundary’). Some provisional details of the coding are now given in 
the hope of clarifying some of these points: 


Programming procedure: 

@ First store the page part of the address in the reserved location & FCFF 
(called the paging register). 

@ [hen perform the desired data transfer to any position on that page by 
reading or writing into the address band &FD00 to &FDFF. 
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Example: to store the number 13 hex in address &3567 of the extended 
memory. 


LDA #&13 \load A with &13 

LDX #&35 

STX &FCFF \store page address in paging register 
STA &FD67_ \ store A in address &67 of page &35 


The paging register location, & FCFF is the odd man out in the Fred band 
because it is used for Jim. 

It is unnecessary to keep loading the paging register each time, providing 
the addresses lie on the same page. If the next address lies on a different 
page, then the paging register must be reloaded. 


Summary 


1. Machine code programming can be tackled with more confidence if 
backed by some knowledge of hardware. 

2. If your machine still has the old 0.1 series ROM, get it changed to the 
1.2 or any of the 1.X series as soon as possible. 

3. The priority on power-up is vested in the ROM occupying the 
rightmost of the five sockets. 

4. The designer’s advice is to access the peripheral interfaces by resident 
subroutines, OSBYTE, OSWORD, etc. This is good advice but need 
not be taken too seriously during the learning stage of machine code 
programming. 

5. The operating system handles standard peripherals by a process of 
orderly interrupts. 

6. An interrupt request (IRQ) is granted only if the mask bit (in the 
microprocessor) has not been set to ‘Il’. A non-maskable interrupt, or 
NMI ignores the mask bit. 

7. Parallel interfaces transmit or receive a complete byte at a time, 
requiring 8 wires to carry the information. Serial interfaces pass only 
single bits, one after the other, on a single line. 

8. The RS423 (an improved version of the older RS232) 1s a serial 
interface which can handle certain printers, telephone signals via a 
modem or act as a data link between other computers. 

9. A 6850 chip handles serial and a 6522 chip handles parallel interfacing. 
10. The A/D conversion chip, a.D7002, allows any one of four varying 
(analogue) input voltages to be represented as a digital number. 

11. The heart of the floppy disk interface is the 8271 controller chip. 

12. Memory maps divide the 64K total into 64 pages. A machine address is 
a four-hex-digit number. The left-hand pair is the page and the right- 
hand pair the address on the page. 


13. 


14. 


16. 


17. 


18. 


20. 


19. 


20. 


24. 


22. 
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One page of memory contains 256 addressable locations (this is exactly 
100 in hex notation). | 
Sheila is page FE, occupying the address band & FE00 to &FEFF. It is 
reserved for accessing all interfaces built into the machine. 


. Fred is page FC, occupying the address range & FC00 to & FCFF. It is 


reserved for any specialised hardware fed from the | MHz bus, such as 
dealer’s test equipment, Teletext, Prestel, etc. 

A small area of the Fred page is reserved for the user’s special 
applications, &FCCO to & FCFE. 

The | MHz bus (see Fig. 1.4) provides eight address lines, AO to A7 and 
eight data wires via a two-way buffer (IC72). A 5 volt power supply 
and special lines are also provided. 

The special lines on the 1 MHz bus include: 


NNMI and NIRQ provide interrupt inputs to the microprocessor. 
They are activated by logic 0. 

NRST delivers logic | during a READ instruction and logic 0 during a 
WRITE. 

NRST delivers a logic 0 when reset is activated. 

NPGFC delivers a logic 0 whenever page FC (Fred) is addressed. 
NPGFD delivers logic 0 whenever page FD (Jim) is addressed. (The 
data buffer is only enabled if either Jim or Fred pages are addressed. 
The 1 MHz clock is an oscillator, sychronised to the 2MHz 
microprocessor clock but running at half the frequency. 
Analogue-in is connected to the sound channels. Any signal supplied 
is added to any other programmed sound. It is not a separate channel. 


Detailed information on the | MHz bus can be obtained from Acorn 
computers. 

Jim is the FD page, occupying the address range &FDO0 to & FDFF. It 
caters for a 64K memory expansion. 

Some devices require only a simple ON/OFF drive obtained by a 
simple decoder from the address bus. The data bus is not required. 

A 74LS138 is a useful decoding chip, capable of activating any one of 
eight devices. The address for each device is determined by the ‘l’s and 
‘0’s applied to the select and enabling inputs. 

Devices which deliver data from the outside world (input devices) 
should be gated with the read/write line (R/ NW) in case a WRITE 
instruction is given in error (see Fig. 1.9). 


Self test 


1.1 


If a ROM or EPROM is ‘empty’, what hex number appears when 
you read any location? 


32 


1.2 


iS 


Hromsduavics 4 


ew 


1.5 
1.6 
1.7 
1.8 
1,9 


1.10 
1.11 


1.12 


1.13 


1.14 


1.15 


1.16 
1.17 
1.18 
1.19 
1.20 


1.21 
1.22 
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You might expect a parallel printer interface to be eight times quicker 
than a serial. Why is there virtually no difference? 

State an advantage of the RS423 serial interface over the earlier 
RS232. 

Name a device which could convert, say, atmospheric pressure 
readings to a proportional voltage. 

What approximate voltage gives full-scale reading at the analogue/ 
digital input. 

What is the highest hex page number? 

What specialised names are given to pages FC,FD and FE. 

The user port and the Centronics printer interface are fed by a 6522 
VIA. On which side of the VIA is the user port? 

How would you find out which version of ROM operating system is 
installed? 

Which instruction in the 6502 will prevent IRQ being granted? 
Passing an ASCII character via the RS423 interface requires ten bits 
Why? 

What is the approximate voltage across a silicon diode when in 
forward conduction? 

With reference to Fig. 1.6, what address would activate pin 12 if the 
inverter to pin 6 was omitted in error? 

With reference to Fig. 1.8, if the maximum output is 2.55 volts, what 
output voltage is obtained when the digital data is 3F hex? | 
With reference to Fig. 1.7, assume the data bus is 4F hex and the latch : 
output is also 4F.-Supposing the data bus now changes to S5F, what is - 
the output from the latch prior to the arrival of the clock pulse? . 
With reference to Fig. 1.4 some pins on the 6502 have a baracross the - 
label. What is the significance? 

Some logic chips have a pin labelled CE. What does this mean? 
The address band within Fred allotted to User Applications is 
&FCCO to &FCFE. How many different addresses are within this 
band? 

The last address in the Fred band is &FCFF. What is significant 
about this special address? 

Sheila addresses are reserved for ...? 

Jim addresses are reserved for ...? 

Fred addresses are reserved for ...? 


Chapter Two 
The 6502 Microprocessor 


Useful abbreviations and conventions 


Isb = least significant bit. msb = most significant bit. 
Bit positions within a byte are numbered 765432 1 0. 
Bit O is the Isb. Bit 7 is the msb. 


A = the accumulator. X = register X. Y = register Y. 
P = process status register. PC = program counter. PCL= low 
byte of PC. 


PCH = high byte of PC. SP = stack pointer. ALU = arithmetic 
and logic unit. 

AR = address register. ARL = low byte AR. ARH = high byte 
of AR. 


Process status flags: 

N = negative (bit 7). V = overflow (bit 6). B = break (bit 4). 
D = BCD (bit 3). | = interrupt (bit 2). 

Z = zero (bit 1). C = carry (bit 0). 


The 6502 versus the Z80 


The 6502 and the Z80 8-bit microprocessors have retained their popularity 
with personal computer manufacturers for many years. Their popularity is 
likely to remain until the approaching 16-bit revolution is established. Both 
the 6502 and the Z80 have good and bad features which are fairly equally 
distributed. The Z80 has sometimes been praised as the more powerful of 
the two but, in the absence of a satisfactory definition of ‘power’ this praise 
has little substance. If by ‘power’ we mean execution speed then neither 
type is superior. Some types of program can execute faster on the Z80; 
others execute faster on the 6502. Because of this it is not wise to pay too 
much attention to ‘benchmark’ tests. Comparison tests for computers have 
about the same reliability index as intelligence tests for humans: they tend 
to test the tester more than the testee. The Z80 has a powerful marketing 
advantage because of its downward compatibility with the Intel 8080. The 
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widely used disk operating system CP/M, for which an enormous amount 
of commercial software has been written, is based on the 8080 instruction 
set, so any microcomputer which runs on the 6502 could be said to be 
disadvantaged in this respect. The justification for discussing the Z80 at all 
in this book (which is supposed oriented towards the 6502) is because one 
of the Second Processor options runs on the Z80. There will be a choice of 
another 6502, a Z80 or the new 16-bit 16032. Because of this, it is 
considered reasonable to include occasional references to all three 
microprocessors. The Motorola 6800 was the ancestor of the 6502. Apart 
from the indirect addressing modes which are unique to the 6502, they are, 
in many respects, similar. 


6502 architecture 


It is possible to delve straight into machine code programming without 
troubling very much about the technical details of the 6502. Indeed, the 
introductory book Discovering BBC Micro Machine Code plunged 
straight into a program on page 8. This book is the sequel, intended to fill 
up some of the gaps left in both the software and hardware treatment. It 
pays dividends in the long run ifthe internal behaviour of a microprocessor 
is understood. It can also be interesting for its own sake. 

It is better to begin by reviewing the microprocessor in relation to other 
main components of the system, as shown in Fig. 2.1. The operating system 
and BASIC language ROMs each have a capacity of 16K bytes (type 
number 13128). They are connected across the address and data buses. 
Note that the address bus is shown split down the middle because it is 
important always to bear in mind that a 4-hex digit address code is handled 
by the microprocessor in two halves, the lines AO to A7 (low-byte) and lines 
A8 to Al1S5 (high-byte). 

The RAM complement is not so straightforward because of the reduced 
packing density within the chip. Each RAM chip in the BBC machine, 
(type number 4816) stores only 16K bits (not bytes) so it is necessary to use 
sixteen of these chips to form a storage system of 32K bytes. (The Electron 
uses four of the new 64K bit RAMs to produce the total 32K bytes.) 
Another factor contributing to complexity is the ‘dynamic’ nature of the 
memory. The correct title for this class of read/write memory is DRAM, 
the ‘D’ prefix standing for dynamic. Due to the need for reducing current 
consumption and maximising packing density, each bit is stored within the 
inter-electrode capacity of MOS transistors (see Appendix A). The stored 
information, however, is a transient affair, leaking away in a few 
milliseconds. Consequently, each stored bit must be periodically recharged 
in order to compensate for the leakage. This process, called ‘refreshing’ is 
inherent in the hardware design and is not the responsibility of the 
programmer. However, the refresh-cycle takes up extra time. DRAMs are 
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6502 Microprocessor 
Address bus Data bus 


A8 — A15 
(high byte) / (low byte) 








SCREEN 
display 
section 


Peripherals 


Various external lines 





SSS SE SSS SS SSS SSS 


-“ 


| 


Fig. 2.1. Position of the 6502 in relation to the external bus devices. 


therefore a compromise in which access time is sacrificed in order to 
increase packing density and reduce cost. It is worth mentioning that the 
BBC and Electron systems are not alone in employing DRAMs. Nearly all 
microcomputers have, and still do, use them. The alternative would be to 
use static RAMs but the cost would be prohibitive and they would occupy a 
greater space on circuit boards. Having noted this, DRAMs will still be 
referred to as RAMs: distinction is academic. Note that, unlike the ROM 
chips, the feed to the data bus is bidirectional. 
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6502 systems are memory-mapped so it is not surprising that keyboard, 
screen display and the input/output interfaces are strung across the address 
and data buses as if they were memory chips. In the case of the screen 
display, the dotted line on the figure indicates the additional data path 
between the area of RAM dedicated to the screen display circuits. To avoid 
cluttering the diagram, the various signal lines forming the ‘control bus’ are 
not shown. 


Inside the 6502 


Figure 2.2 shows reasonable, but by no means complete, details of the 
paths between the various registers. Such paths within the microprocessor 
are often called highways because they ramify over the chip area to provide 
a kind of long distance communication. 


Clock 


6502 Microprocessor 


















Decode Matrix 


: Instruction Reg. 
Micropram ROM 


3| 
| 
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Control lines to 
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1 Stack pointer 


program counter 
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Address fregister 


We ae da 


Control bus Address bus O = highway gates Databus R/W 


Fig. 2.2. 6502 registers and highways. 
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The chaos is only apparent. Control lines (not shown) operate the input 
and output gates of each separate register, ensuring that only one pair is 
allowed access to the highway at any one time. For example, during the 
machine code instruction TAX, only register A output gate and register X 
input gate are open to the data highway, allowing the contents of A to be 
transferred to X. | 

The majority of instructions we give to microprocessors are data 
transfers, either between internal registers or between registers and the 
external RAM, ROM or peripherals. Some instructions, such as ADC 
(add with carry) perform arithmetical operations on the data but the data 
still has to be fetched from somewhere else. Even a simple instruction like 
INX (increment contents of X) involves a transfer because the X register is 
not equipped for altering itself. Instead, the contents of X must be 
transferred along the highway to the arithmetic section and subsequently 
returned. 


Directly programmable internal registers 


It is assumed that many readers will already be aware of the various 
registers and their functions but, for the sake of continuity a brief 
description follows together with the standard abbreviations subsequently 
used in all references. A distinction is made between directly pro- 
grammable and the other registers which, although playing a vital role, 
remain in the background, unseen by the programmer. 


Accumulator (A) 

This register has a supreme role. It is the only one capable of performing 
arithmetic processing. This is evident from Fig. 2.2 which shows that, in 
addition to the usual connection to the highway, there is a direct and 
exclusive link to the Arithmetic and Logic Unit (ALU). It is involved in 
transfers to and from memory and acts as interim data storage during 
arithmetic and logic operations. For example, during a simple addition of 
two numbers (ADC), the first number must pass to the accumulator and 1s 
then ‘entered’ to a holding register within the ALU. The second number 
then enters A, the addition is carried out and the result sent back to A. 
Those used to scientific calculators in the Hewlett-Packard range will 
recognise the inherent Reverse Polish (RP) action. 

It is worth digressing a little to explain RP.A Polish mathematician 
proposed a new method of expressing arithmetic, the essence of which was 
placing the operator (+, —, X etc.) after, instead of in between, variables. 
For example, instead of writing A+B to indicate addition, he proposed 
that it should be written AB+. Because his name was quite unpro- 
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nounceable (and almost unspellable in English) his system has become 
known simply as Reverse Polish Notation (RPN or simply RP). The 
influence of von Neumann on the evolution of the computer was mentioned 
in Chapter 1. He suggested that the arithmetic system of digital computers 
would operate most efficiently if based on RPN. Thus the ALU of the BBC 
machine, in common with nearly all other computers, requires the two 
variables first; the add operator is then activated and the result passed to A, 
replacing the previous contents. 

The dominance of the accumulator over other registers is evident from 
the instruction set of the 6502. However, the fact that only one accumulator 
is present gives ammunition for the protagonists of the rival Z80 which 
boasts eight accumulator-type registers. A single accumulator does tend to 
be restrictive in organising efficient machine code. 


The X and Y registers 

Like the accumulator, the X register and the Y register (subsequently 
referred to as X and Y)are both 8 bits wide. They have three primary uses in 
programming. 


(1) They make up for the inconvenience of the solitary accumulator. 
Important data residing in A can be transferred temporarily by the use of 
TAX or TAY and later, when A is free, transferred back using TXA or 
TYA. 

(2) They can act as up-counters or down-counters for setting up machine 
code loops. This is due to the ease with which they can be incremented or 
decremented by the instructions INX, DEX, INY or DEY. It is curious that 
the designers failed to provide an equivalent instruction for incrementing 
or decrementing A. The only way is by the relatively inefficient method of 
adding or subtracting 1, using ADC or SBC. 

(3) They are fundamental to the technique known as address modification 
by indexing. When using an indexed addressing mode (denoted in 
assembly form by a comma followed by X or Y), the data in the X or Y 
register is automatically added to the data in the operand. 

The resultant is interpreted as the address of the required data. This idea 
was pioneered by a team at Manchester University and, at the time, 
represented a huge step forward in computer science. They called the index 
register the ‘B box’, presumably to differentiate it from the accumulator A. 
Previous to this, altering the operand address in loops was cumbersome. It 
involved loading the operand from inside the program, incrementing it and 
then storing it back in the original position. In other words, it was necessary 
to alter the program in order to modify an address. Indexed addressing Is so 
much cleaner to work with and certainly less error-prone. Most of the 
indexable instructions in the 6502 allow a choice of using either X or Y for 
indexing. Although indexed addressing is later dealt with in detail, there is 
no harm in a little anticipation for the benefit of those who are new to the 
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idea. So, consider an example in which register X contains 30, and we 
write LDA 100,X. | 

The simple instruction LDA 130, however, would have the equivalent 
effect. They would both load the contents of address 130 into A. The 
advantage of the indexed over the simpler form will be apparent when 
organising loops involving action on consecutive addresses. 

This discussion should help to explain why the address bus, as well as the 
data bus, has access to the ALU. This should be understandable now it is 
recognised that the index register contents have to be added to the operand. 
After all, address modification by indexing produces a computed address 
and only the ALU can truly compute. 


The process status register (P) 

If we define a register as an internal memory location for holding or 
processing data, then the process status register (P) is not a register at all. It 
is, in fact, a collection of isolated flip-flops (see Appendix A), each capable 
of storing one bit. Each bit is called a ‘flag’ because it conveys certain 
information in yes/no form either for the benefit of the machine or the 
programmer. After most instructions, the relevant flags are updated, 
depending on the result. There is no connection, either in the hardware or 
software aspects, between different flags. In spite of this, it is convenient 
and conventional to refer to it as a register. It is important to the 
programmer to understand the exact significance of each flag, under what 
conditions they are set or reset, which are under the control of the 
microprocessor and which are directly programmable. 


The N bit: If this is 1, the last result contained a | in bit 7 position. The N bit 
is often misleadingly called the ‘sign bit’ because two’s complement 
arithmetic recognises bit 7 as the sign rather than magnitude. If the number 
is unsigned binary, the N flag merely indicates the state of bit 7. It is 
automatically set or reset and is not directly programmable. BMI (branch if 
minus) and BPL (branch if plus) are the relevant branch instructions 
conditional on the state of the N bit. Most instructions leave the N bit 
updated as part of the execution routine The notable exceptions being 
STA, STX, STY, TXA, and all branch and jump instructions (see 
Appendix C for complete coverage), LSR is unique in that the N bit is 
always reset to 0, irrespective of the result. | 


The V bit: If this bit is 1, it indicates that the last arithmetic instruction 
caused two’s complement overflow due to the result being outside the 
capacity of a single byte. It can be tested by the conditional branch 
instructions, BVS or BVC. It is of no significance to the programmer when 
using unsigned binary because bit 7 of the result represents magnitude 
rather than sign. In this case, it can be ignored. However, the V bit also 
plays a major role in the BIT test instruction, assuming the same state as bit 
6 of the data being tested. 
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It is possible to directly clear the V bit to 0 by the instruction CLV 
although there is no corresponding instruction to directly set it to 1. Only 
the instructions ADC, SBC, BIT, PLP, RTI and CLV affect the V bit. 


The B bit: This is set to | when a BRK instruction is encountered. Its 
significance is limited almost entirely to interrupt sequences. It cannot be 
directly programmed. 


The D bit: The 6502 can perform arithmetic on straightforward binary 
numbers or on BCD (Binary Coded Decimal) numbers. The programmer 
decides this by the use of either SED (set decimal) which makes D = 1 or 
CLD (clear decimal) which makes D= 0. The arithmetic mode currently in 
use remains until the D bit is altered. The default mode is D = 0. The 
instructions which affect the D bit are CLD, SED, PLP, and RTI. 


The I bit: This is called the interrupt mask bit or the interrupt inhibit. It is 
inspected by the microprocessor when an interrupt request is received from 
a peripheral source. If it is 1, the request is not granted. It can be directly set 
to 1 by SEI (set interrupt) or cleared to 0 by CLI (clear interrupt). These 
instructions are vital when designing the software for peripheral interfaces, 
most of which will be interrupt-driven. The instructions which affect the I 
bit are BRK, CLI, SEI, PLP and RTI. 


The Z bit: This is the zero bit, and is set to 1 whena result is 0. This is worth 
emphasising strongly because it is often interpreted back to front. Ifa result 
is non-zero, the Z bit goes to 0. It can be tested by the branch instructions, 
BEQ (branch if equal to zero) or BNE (branch if not equal to zero). There 
are no instructions which can directly effect it. Most instructions affect the 
Z bit. The exceptions include TXS, STA, STX, STY and the branch and 
jump instructions. 


The C bit: This is the carry bit, and is set to | when acarry out from the msb 
is detected. Instead of the bit ‘dropping on to the floor it is popped into the 
C bit. It can also be thought of as the ninth bit, particularly in shift and 
rotate instructions. It can be tested by the branch instructions BCS (branch 
if carry set) or BCC (branch if carry clear). It can also be directly 
programmed by SEC which sets C to | or CLC which clears C to 0. 
Instructions which affect the C bitare ADC, SBC, ASL, LSR, ROL, ROR, 
SEC, CLC, PLP, RTI, CMPCPX and Cry. 

It is clear from the above that the process status register flags have a 
profound effect on program behaviour. The majority of errors encoun- 
tered, particularly when setting the terminating conditions for loop exit, 
are due to misinterpreting the behaviour of the flag bits. Unless you are 
already confident in this area you would do well to reread the above 
treatment several times. 
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The stack pointer (SP) 

This is an 8-bit register, dedicated to the automatic control ofa special area 
in page one in RAM memory designated the ‘stack’. Its function is as an 
address generator. It is impossible to describe the stack pointer fully 
without describing the stack itself. Because the stack is so important in its 
own right, discussion of its anatomy will be postponed. It is sufficient at 
this point to grasp the following essentials: | 


(1) The contents of SP are interpreted by the microprocessor as the address 
of the currently vacant location in the stack. 

(2) To ensure the address is always on page I, rather than page 0, a 
permanent | is hardwired at the msb end of SP acting as a ninth bit. Thus, if 
SP itself contains 0000 0011 (0003 hex), the address is interpreted as if it 
were | 0000 0011 (0103 hex). That is to say, address 3 on page | rather than 
address 3 on page 0. 

(3) SP can be loaded initially to any address on page | but the method is a 
little cumbersome. There is no actual instruction to load SP directly. It is 
necessary first to load X and then transfer it to SP by the instruction TXS. 
This may seem a chore but in practice it may only have to be done once, 
during the initialisation phase of a complete machine code program. In 
fact, it would be generally unwise to tamper with SP at all when using the 
assembler because it will have been initialised by the ROM operating 
system. However, if you are brave enough to attempt circumvention, SP is 
normally initialised to FF hex in order to utilise the entire stack area. 
(4) Once initialised, the use of the stack is simplicity itself. If you want 
temporarily to save the contents of the accumulator, without having to 
specify a storage address, just push it on to the stack with PHA (Push A). 
To retrieve it again, pull it back with PLA (Pull A). It isnot possible to push 
X or Y directly but it can be done piecemeal by first using TXA or TYA. 
(5) The stack is a LIFO, meaning Last In First Out, memory so you must 
pull data back with this in mind. After every push, SP decrements by | in 
order for the next push to operate ona new vacant location. When data isto 
be pulled back, SP first increments by | (in order to point to the last stored 
item) before the pull operates. The stack pointer automatically ‘rises’ with 
each push and ‘falls’ with each pull so there is no need to bother with SP (see 
Fig. 2.3). You can forget the existence of the stack pointer providing you 
remember that the /ast item pushed onto the stack from A willalways be the 
first item pulled back into A. 

(6) The stack can only hold one page- 256 bytes, that is. If you overflow the 
stack, there is no friendly warning as in BASIC. All that happens is a ‘wrap 
around’ effect. For example, if SP is initially set to FF hex and data keeps 
piling on the stack, SP eventually reduces to 00. The next decrement wraps 
around to FF again causing weird and unexpected results. However, a 256 
byte stack is normally more than ample for most programs and overflow 
conditions should be rare. 
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Fig. 2.3. The stack and stack pointer. 


(7) In addition to its use as a temporary dumping ground for general work, 
the stack plays a vital role in both subroutines and interrupts. When a 
subroutine is called by means of JSR, the two bytes forming the return 
address (which will be in the program counter) are pushed onto the stack - 
high byte first, low byte second. When the subroutine ends (with RTS), the 
return address is pulled back from the stack (low byte first, high byte 
second) and passed to the program counter, allowing the body of the 
program to resume again. 


Registers which are not directly programmable 


In any microprocessor, some of the most important registers remain 
transparent (or at least translucent) to the programmer. That is to say, 
instructions are not provided which make direct reference to them. In fact, 
the more important a register, the less likely is the programmer allowed 
direct access. In the 6502, the unseen registers (refer back to Fig. 2.2) are the 
Program Counter (PC), the two address registers ADL, ADH and the 
Instruction Register (IR). 


The Program Counter (PC) 

This enjoys the honour,of being the only 16-bit register in the 6502. If there 
is an established register hierarchy, then PC is the undisputed candidate so 
its function deserves strong emphasis: 


The contents of the Program Counter is always the address of the 





next instruction byte to be executed. 
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The 16-bit length allows reference to any address in the entire 64K range. 
Once a stored program is commanded to ‘start execution’, the following 
automatic sequence begins: 


(1) The contents of PC is transferred to the address bus and the first 
instruction byte at that address is loaded into the computer and ‘processed’. 
(2) The PC then increments by I. , 

(3) The PC is again transferred to the address bus and the next instruction 
byte is loaded and ‘processed’. 


The sequence continues indefinitely, sweeping through the program bytes 
like a scythe until halted legitimately or an illegal code is reached. The 
sequence makes no distinction whatsoever between program and data. It is 
up to the programmer to arrange the instruction bytes in consecutive 
address order and organise either a break (BRK) or an orderly return to the 
operating system loop. If the PC is allowed to reach data bytes it will 
interpret these as instructions which the 6502 will either attempt to execute 
or crash in despair. 

It is all very well describing the sequence but how does PC know where 
the program starts? When entering a program under the direction of the 
assembler, there is no problem. It is simply a case of knowing the starting 
address of a program and assigning this to the reserved variable P%. In 
other words, P% appears to the user of the assembler as PC. But this 
convenience is by courtesy of the software built into the operating system 
ROM. As a matter of fact, the actual mechanism of loading the PC gives 
rise to a disturbing question which strikes at the root of stored program 
sequence control. This is the question: How is it possible to load PC with 
the starting address of the program unless there is already a program 
capable of performing the load action? 

This is a chicken and egg situation because we can’t fall back on the 
‘operating system’. The operating system is also a program so how was this 
loaded originally? There have been, and still are, various solutions to the 
problem although we are only concerned here with the method inherent in 
the 6502 microprocessor. When the reset line (RES) is momentarily 
grounded (usually arranged to coincide with the closing of the power-on 
switch) the following series of events take place: 


(1) All peripherals connected to the reset line are initialised to an orderly 
‘start-up’ state. The interrupt mask is set to | to prevent the possibility of an 
interrupt during the start-up sequence. 

(2) PC is loaded with the data which happens to be resident in the special 
addresses & FFFC (low byte) and & FFFD (high byte). These are called the 
Start-up vectors. 

(3) PC commences program execution because it now contains the starting 
address of the program. 


From the above, it is evident that the writers of the operating system must 





44 Advanced Machine Code Techniques for The BBC Micro 


ensure that the correct starting address is in & FFFC and &FFFD. It is 
equally evident that they must be in ROM (RAM can only be loaded with 
data by a program which already exists). Note that the concept of a 
vectored address allows the system programmer complete freedom to 
position the program anywhere. It would have been easier, of course, for 
microprocessor designers to lay down a mandatory starting address, say, 
‘all programs must start at address &0000’. This would allow PC to be 
initialised by a simple zero reset. However, the vectored address approach 
is flexible and we should remember that infinite flexibility has always been 
the goal of computer scientists. There are three vectored addresses in the 
6502 and, for completeness, these are shown in the following table: 


6502 vectored addresses 


Vector address Function 





&FFFA and &FFFB Non-maskable interrupt 
&FFFC and & FFFD Start-up/ reset 
&FFFE and &FFFF Interrupt request 


Although PC ensures that instructions are normally accessed and executed 
in consecutive address order, there are times when the sequence must be 
broken. When ajump or conditional branch instruction is encountered, the 
current contents of PC are altered drastically. In the case of an absolute 
jump, the entire contents of PC are replaced by the instruction operand. 
Branch instructions, however, use relative addressing rather than absolute. 
The operand is in the nature of an offset which is added to, rather than 
replacing, the existing contents of PC. Since the offset is in two’s 
complement binary (allowing positive or negative numbers) it is still 
possible to branch forward or backward. 


The Instruction Register (/R) 

The first byte of all machine code instructions is the operation code 
(abbreviated to ‘op-code’). The code, which is different for every 
instruction and addressing mode, carries two vital pieces of information: 


(1) What kind of operation is required. 
(2) How many operand bytes (if any) are still required to complete the 
instruction. 


On receipt of the code from memory (Known as the FETCH phase) it is 
routed via the highways to IR where it is held pending execution. If the 
decoding reveals that the instruction requires no further operand bytes 
(such as TXA, TAX etc.), the instruction sequence enters the EXECUTE 
phase. If, on the other hand, decoding reveals that one or more operand 
bytes must follow, the sequence remains in the FETCH phase until the 
complete instruction has been received from memory. 
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The Data Register (DR) 

The data bus carries information downwards from the microprocessor 
when writing to memory and upwards to the microprocessor when reading 
from memory. Because of this, DR operates as a bidirectional holding 
register, controlled by the R/W line. You will remember, from earlier 
discussions, that when R/ W is in the high state (logic 1) the DR would be 
switched to the READ direction, and to the WRITE direction when in the 
low state (logic 0). The power levels on the raw bus are weak and external 
buffers may be needed to boost the power. A full 64K of memory with 
additional peripheral loads could lead to degradation of logic levels (see 
Appendix A). 

Whilst on the subject of the data bus, it is convenient to discuss the effect 
of data-jamming. It is essential that all memory and peripheral devices 
connected directly to the data bus are equipped with ‘tristate’ outputs. That 
is to say, when the devices are in the disabled state, their connections to the 
data bus should be electrically impotent. Tristate devices ensure this by 
effectively open-circuiting the outputs during the disabled state. 


The Address Register (ARL and ARH) 

A 4-hex digit address describes a 16-bit logic pattern on the address wires 
AO to AIS. The address information can originate from several possible 
sources. It could originate from A, the output of the ALU or even the data 
bus. From whatever source, it will eventually be routed along the highway, 
ending up in the address register. This register is split into two halves, each 
contributing a byte to the two-byte address. The lower order byte (AO to 
A7) is held in ARL and the high order byte (A8 to AI5) in ARH. As 
discussed earlier, the high byte determines the page address and the low 
byte the address on the page. The individual lines on the address bus are 
direct outputs of the registers. They are, of course, always outputs so the 
R/W control line is not involved. It should also be noted that, unlike the 
data bus, devices connected to the address bus need not be tristate. This is 
because the address bus is always an output from the microprocessor 
intended to feed only the address decode circuits of memory or peripheral 
devices. Only the address registers can supply the bus so there is no 
possibility of data jamming by alternative logic voltage sources. 


The microprogram 


The term ‘microprogram’ has nothing to do with programs written for a 
microcomputer. In fact, microprograms are those which are buried inside 
the silicon of the microprocessor chip itself! It may surprise some readers 
that every instruction in the repertoire (about 200 in the 6502) requires its 
own special microprogram. A simple machine code instruction like LDA 
&72 is simple only from the viewpoint of the human intellect. In contrast, 
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logic circuits (which are baffled if required to answer any question with 
other than yes or no) require considerable assistance in dealing with LDA 
&72. They need micro-instructions, fed one at a time in order to open and 
close the appropriate register gates and activate the control lines. These 
micro-instructions must be given in the correct sequential order for every 
individual instruction. Since a sequential set of instructions is, by 
definition, a program, then it becomes evident that the earlier statement is 
justified: Every instruction needs its own microprogram. 

We do not propose to examine in detail each of these microprograms. 
This would take more space than this book allows. However, it is 
interesting to examine a possible microprogram for the simple instruction 
mentioned previously: LDA &72 will LoaD A with the data stored at 
address &72 on page 0 hex. This instruction consists of two bytes, which we 
will assume are residing at addresses &2E34, &2E35. The microprogram 
will first have to fetch these two bytes from memory. 


The FETCH phase: 


PC, having just dealt with the last byte of the preceding instruction, will 
already have been incremented to &2E34. A typical sequence would be: 


(1) The contents of PC pass to ARL and ARH. 

(2) The R/W line goes or remains high, causing the op-code (LDA) to be 
read from memory and passed, via the data bus, to DR. 

(3) The contents of DR are then passed to IR and the instruction is 
decoded. From this decoding, the system now ‘knows’ there is a single 
operand byte to follow. PC is incremented to &2E35. 

(4) The contents of PC pass to ARL and ARH. 

(5) The memory is again read, causing the first operand byte(&72) to enter 
DR. 

(6) PC is again incremented. 


The complete instruction is now lodged in the microprocessor registers, 
ending the FETCH phase. The EXECUTE phase now begins. 


The EXECUTE phase: 

(7) The operand (&72) in DR is passed, via the highway, to ADL. ADH is 
cleared to zero (because it is a page 0 address). 

(8) The memory is read, and the data at address &72 is passed to DR. 
(9) The contents of DR are passed to A. 


The instruction has now been executed with the PC left pointing to the 
address of the first byte of the next instruction. 


The instruction chosen in the example was particularly simple and yet 
the microprogram was quite involved. It is left to the imagination to 
visualise the microprogram for ADC (&72), X (post-indexed indirect 
addressing). 
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Microprogrammers are a specialist breed and usually employed on the 
design staff of the chip manufacturer. It is fortunate that the brief outline 
above on microprogramming was included as a topic of interest only. The 
normal machine code programmer takes each complete instruction for 
granted and is oblivious to the existence of the internal microprogram 
steps. If we call machine code a low-level language, then micro- 
programming is at ground-level! | 


The decoding matrix 


Figure 2.2 shows the decode matrix. Its function is to accept the op-code 
held in IR, decode it, and finally output a pattern of bits on the various gate 
and timing controls. This pattern will be different for every step in the 
microprogram. If this function is analysed carefully, we may come to the 
conclusion that the decode matrix will behave like a miniature computer 
with a number of fixed programs inside. We can relate IR to the ‘program 
counter’. The op-code is only the starting address of the relevant 
microprogram. The ‘words’ read out from the ROM are the bit patterns 
supplying the various register gates and controls. These patterns will vary 
for each step of the microprogram. The gate controls are all hardwired to 
the various registers. This wiring is omitted from Fig. 2.2 to prevent an 
already complex diagram from becoming incomprehensible. 


Sub-pulses of the clock 

It is not always appreciated that the clock pulses, which in the 6502 are 
running at 2 MHz (0.5us period), are split up within the decode matrix. 
Several sub-pulses are formed, each sub-pulse initiating each step of the 
microprograms. Within the matrix, the clock pulses are merely the ‘low- 
frequency’ envelope of the sub-pulses. 


The Arithmetic and Logic Unit 


Addition, subtraction and logical instructions will obviously be the 
responsibility of the ALU. However, in the interests of versatility, nearly all 
data is made to pass through the ALU irrespective of the particular 
instruction. For example, data can pass through the ALU without change 
by adding zero. This may seem time-wasting but is justifiable from a wider 
viewpoint of a widersystem. Forexample, address modification byindexing 
involves adding the contents of X or Y to the operand so the ALU is directly 
involved. 

The 6502 is incapable of multiplication, division or exponential 
operations. It is not alone in this respect. It is very rare to find 8-bit 
microprocessors capable of performing any arithmetic instructions other 
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than addition. Even subtraction is achieved by the roundabout way of 
adding the complement. 

Before criticising these limitations, it should be remembered that the 
microprocessor was designed with the primary objective of controlling 
electrically operated devices and a primitive instruction repertoire was 
quite sufficient for the purpose. It was never intended to be the brain of a 
general purpose computer. However, if a machine can add, it is fairly easy 
to write subroutines which can multiply, divide and handle exponentials. 
Users of BASIC, or indeed most other high level languages, are unaware of 
the primitive capabilities of the microprocessor although they have to pay 
for it by reduced execution speed. Software solutions are always much 
slower than hardware implementation. 

The new breed of 16-bit microprocessors are virtually second generation 
products many of which are including instructions which perform direct 
multiplication and division at an impressive speed. 

The design of an ALU is based on a parallel binary adder which can be 
considered as the prototype. With this as a basic building block, it is a 
relatively simple exercise in logic to arrange gates for implementing 
exclusive-or (EOR), logical anding (AND) and the inclusive-or (ORA) 
functions. Finally, it would only require ‘function select’ inputs to complete 
the transformation. Four of these, driven by the output word from the 
control matrix, could activate any one of 16 functions. 


Software Interrupt (BRK) 


The details of all machine code instructions are given in the relevant 
chapter but it is convenient at this stage to introduce the BRK instruction. 
Interrupts are normally the prerogative of peripheral devices but BRK is 
software initiated. Superficially, it just stops the computer but adig beneath 
the surface reveals some interesting side-effects. The instruction takes 7 
clock cycles to complete the following steps: 


(1) It sets the B flag in the process status register (P). 

(2) Adds 2 to PC. 

(3) Sets the I and B bit in P then pushes P on stack. 

(4) Loads contents of & FFFE into PCL and FFFF into PCH. 


The motive behind this seeming complexity is to aid the writing of software 
error traps during program development. It is common practice to put 
BRK at strategic ‘bug-hazard’ points. This would be useless if the sole 
function of break was to kill all program flow completely. However, it will 
be seen from the above that a convenient loop-hole is prepared. Providing a 
routine is written, with the start address residing in the Break Vector at 
& FFFE/&FFFF, control is automatically diverted to the routine rather 
than’ stopping dead. The routine must establish, by pulling P back from the 
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stack, that the B bit was set as a result of a true BRK rather than a genuine 
peripheral interrupt. | 


Summary 


i. 


Z. 


10. 


11. 


12. 
13. 


14. 


15. 


16. 


17. 
18. 


Both the Electron and the BBC machine use the 6502 and both use the 
same machine code assembler. 

The 6502 does not allow disk software written under CP/M. To utilise 
CP/M, the BBC Second Processor controlled by the Z80 micro- 
processor is required. 


. The BBC machine uses sixteen type 4816 chips to form 32K of RAM. 
. All internal registers communicate by means of input/output gates at 


the entrance to the highways. The gate controls ensure that only data 
from one register output occupies a common highway. 


. Not all registers in the 6502 are directly programmable. 
. The single accumulator (A) is the only register equipped with 


arithmetic facilities. There are no incrementing or decrementing 
instructions. 


. X and Y registers are used as transfer registers to and from A, loop 


counters and indexed addressing modes. 


. Address modification by indexing consists of adding X (or Y) to the 


operand address. 


. The process status register (P) is a collection of seven independent flag 


bits, each signalling some important result. Most instructions keep P 
updated. Conditional branch instructions depend on certain flags. 
The N flag = 1 if bit 7 is set. The Z bit = | on a zero result. Neither are 
directly programmable. 

The V bit = 1 ona two’s complement overflow result. It can be ignored 
if the data is unsigned binary. The V bit copies bit 6 during the BIT test. 
It can be directly programmed by CLV and SEV. 

The B bit = 1 if an interrupt occurs as a result of BRK. 

The D bit can only be directly programmed. SED makes D = | causing 
subsequent numerical data to be processed in BCD format. CLD 
makes D = 0 causing subsequent numerical data to be processed in 
BCD format. 

SEI sets the I bit, preventing interrupt when requested by IRQ. It is 
cleared by CLI. 

The C bit sets to | on detecting a carry-out from the msb. The C bit can 
act as a ninth data bit. 

The stack is any dedicated area in page | of RAM. PSH pushes A to 
stack. PLA pulls A back from stack. 

The address of the next vacant stack address is maintained by SP. 
SP is automatically incremented after a push and decremented before a 
pull. This causes the stack to rise and fall. 
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19. 


20. 
21. 


Z2. 


z3: 


24. 


25% 


26. 


Zt. 


28. 
29. 


30. 


34. 


32. 


33. 


34. 


SF 


36. 


Sf, 


The contents of SP can be initialised or changed only by means of 
TAS. 

Access to the stack must obey the rule, last in — first out. 

The stack is used by subroutines to hold the return address, pending 
RTS. 

The program counter (PC) is the only 16-bit register and is not directly 
programmable. It is in supreme control of the program sequence by 
always pointing to the next byte in the program. 

PC is altered directly by JMP. An offset may be added by relative 
address action during conditional branches. 

A pulse on the reset line (RST) initialises peripheral devices to zero. PC 
is then loaded by the contents of the address pointed to by the start-up 
vector at & FFFC, & FFFD. 

The instruction register (IR) holds the op-code fetched from memory. 
The op-code defines the instruction type and carries information on 
the number of operand bytes to follow. 

The data register (DR) is a bidirectional buffer between the data bus 
and the highways. 

Devices which deliver outputs to the data bus must be tristate. 

The address register is in two halves, ADL and ADH, holding low and 
high bytes of address respectively. 

Each instruction is fetched and executed by an individual micro- 
program. 

Microprograms are built into the 6502, and selected by the op-code 
currently resident in IR. 

Each step of the microprogram outputs a series of bits on the control 
lines, opening the appropriate highway gates, R/ W lines, etc. in the 
correct sequence. 

A microprogram is divided into the FETCH phase which, apart from 
the number of bytes fetched, will be similar for all instructions. The 
EXECUTE phase is different for each instruction. 

The 2 MHz clock pulses are subdivided within the decode matrix to 
form many sub-pulses. Sub-pulses are used for timing the individual 
steps of a microprogram. 

The ALU is built round a parallel binary adder. Subtraction is 
achieved by adding the two’s complement. EOR, ORA and AND 
operations are incorporated by modified addition. 

Multiplication, division and exponentiation is not catered for. They 
are relegated to software solutions. 

BRK provides software interrupt. It sets the B bit and I bit, pushes the 
status register to stack and loads the contents of the break-vector 
locations into PC. 
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2.19 
2.20 
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What kind of RAMs require refreshing? 

Why are static RAMs unpopular in microcomputers? 

What is meant by ‘memory-mapped peripherals’? 

How is incrementation performed in the X and Y registers? 

In what important respect does the accumulator differ from other 
registers. 

What is the effective address (in hex) if Y contains &04 and the 
instruction is LDA &2CFF,Y. 

Why is the process status register a non-typical register? 

Which flag bits can be directly programmed? 

There are two flag bits which can only be changed by direct 
programming. Which are they? 

Which flag bits cannot be directly programmed? 

Under what circumstances can a programmer safely ignore the status 
of the V bit? 

Apart from detecting overflow, what other significance has the V bit? 
If the Z bit is 0, what can you deduce from this? 

Which flag can be treated as a ‘ninth’ bit? 

Whereabouts in the system is the stack? 

How can the stack pointer be initially loaded? 

Is the stack pointer automatically incremented before or after PLA? 
If data items A, Band Care pushed onto the stack in that order, which 
item is pulled back first? 

During a subroutine call by JSR, the two-byte return address is 
pushed to stack. Which byte is pushed first, high or low byte? 
From which register does the return address (mentioned in 2.19) 
originate? 

What address is left in the program counter after a byte is fetched? 
Name the control line which defines the direction of the data register? 
How does a microprogram differ from a conventional program? 
In what register is the op-code held during decoding? 

How can a programmer detect whether an interrupt was caused by a 
peripheral request or by software? 

For which microprocessor instruction set was CP/M written? 





Chapter Three 


The 6502 Instructions 
and Addressing Modes 





Initial terms and definitions 


Some readers will be aware of the following points but repetition is not 
always valueless. In any case, the terms used to describe aspects of machine 
code are far from standardised. The previous book, Discovering BBC 
Micro Machine Code, examined each separate instruction in reasonable 
detail and it would be pointless to go over the same ground here. Instead, 
the complete instruction set is relegated to Appendix C which should be 
consulted frequently during reading this and subsequent chapters. When 
programming in higher level language suchas BASIC, anindividual order to 
the computer is called a statement. For example, Energy = Mass*C “2 isan 
example of a statement. 

In machine code, orders given to the computer are by means of 
instructions. Instructions are primitive and many are needed to form the 
familiar high level statements. An instruction will normally consist of an 
op-code to indicate the required action and an operand to indicate where 
the data is to be found. Sometimes, the location of the data will be obvious 
from the op-code but, in the general case, an operand is required. 

There are several ways in which the operand can specify the location of 
the data. They are known as addressing modes and there are thirteen of 
them in the 6502 although not all of these are available to every instruction. 
Because one byte is used for the op-code it would be possible to have 256 
different ones. However, 90 of the possible combinations are reserved for 
‘future expansion’ (illegal in other words). This leaves 166 valid 
instructions to choose from. The task of selecting the most suitable op-code 
is less bewildering than it appears from the figures. There are only 56 
completely different instructions. It is the available addressing modes for 
each instruction which multiply the choice. 

The op-codes are specified by means of a pair of hex digits. There is a 
different op-code for every variation of addressing mode. However, the hex 
coding is really of academic interest because all machine code on the BBC 
machine will be entered by means of the resident assembler. The details of 
the assembler will be discussed fully in Chapter 4. The most useful property 
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of an assembler is the facility to enter op-codes in three-letter mnemonic 
form. The desired addressing mode is indicated by the form in which the 
operand is written. The repertoire of instructions is set out formally in 
Appendix C. Consequently, the purpose of this chapter will be to explain 
the symbols, to define the addressing modes and to offer guidelines on the 
choice of a particular instruction and the most suitable method of 
addressing. 


Factors influencing choice 


It is not easy to give a specific answer to the question ‘What is the correct 
instruction to use here?’ The choice is very often a compromise between 
execution speed, memory economy and the demands of structure. 
Newcomers to machine code may be quite satisfied if their subroutine 
works at all but it soon becomes apparent that there are good and not so 
good variants. It is popularly supposed that a program written in machine 
code will always be much faster and take less memory than the BASIC 
version. This is a reasonable generalisation but not a universal truth. A 
poorly written machine code program could be slower than the BASIC 
equivalent. Even if it is faster, it is well to remember that a speed advantage, 
to have any real meaning, must be assessed on human, rather than machine, 
time scales. If a BASIC version runs in one second and the machine code 
version runs in a millisecond, the advantage is academic rather than visible. 
The items of information needed to assess the merits of each instruction are 
as follows: 


(1) What does it do? This information is conveyed by a three-letter 
mnemonic such as LDA or ADC. Although the mnemonic itself conveys a 
reasonable idea of what the instruction does, it is primarily intended as an 
aid to the interpretation of a listing. It cannot cover all the subtleties. It is 
necessary to augment the mnemonic by either a verbal definition or by a 
loosely standardised format known as operational symbols (discussed 
later). 

(2) What addressing modes are available? 

(3) What flags in the process status register are altered (updated)? 
Ignorance or confusion in this area is the cause of many an intractable bug. 
(4) How many clock cycles does it consume? The BBC machine runs at 
2 MHzsoeach clock cycle is half a microsecond. The number of clock cycles 
is influenced more by the addressing mode than the actual instruction. Clock 
cycle time is particularly critical if the instruction is within a loop which 
revolves many times. Outside a loop, it is seldom important enough to 
influence choice. 

(5S) How many bytes are in the instruction? All instructions take at least one 
byte because they all have an op-code. The operand, however, can be 
absent altogether, one byte long, or two bytes long. Knowledge of the 
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number of bytes required can be helpful. For example, it can be a matter of 
doubt in certain circumstances whether to write &004B or &4B in the 
operand. They are mathematically the same but an incorrect choice can 
cause havoc to the program. 

(6) What is the hex op-code? Programming will always be performed with 
the aid of the assembler which uses mnemonic op-codes. However, it is still 
necessary at times to be aware of the hex coding for every instruction 
because the assembled machine code program will include it. It is easy to 
use an incorrect address mode by mistake when writing the operand but the 
hex code, which is specific for the addressing mode, might highlight the 
error during debugging. It is interesting, but not particularly rewarding, to 
write out the hex code in binary. It gives an insight into the mind of the 
microprocessor designer because some intriguing patterns emerge which 
can give a clue to the microprogram within the chip. 

(7) What is the correct syntax for the operand? This depends on the 
addressing mode and the rules are rigid, more so than in BASIC. The 
assembler does its best but it would be foolish to add user-friendliness to its 
list of virtues. Make a mistake and you are on your own! 


Operational symbols 


Universities have traditionally considered computing and data processing 
subjects to be the prerogative of the mathematics department. The 
computer is useful as a tool in mathematics so it was considered only natural 
that computing should be taught by mathematicians. Whether this has 
helped or hindered progress may be arguable. There is no denying that a 
mathematical brain was behind the establishment of operational 
symbols. 

How do we describe exactly what an instruction will do, bearing in mind 
that there must be one, and only one, interpretation? Normal language is 
one way; perhaps the obvious way. But, to a mathematician, normal 
language lacks precision and is difficult to formulate concisely without 
using a lot of ifs and buts. Operational symbols are concise and 
unequivocai. They explain what the instruction does but make no attempt 
to explain the meaning of the operand. This is understandable because the 
meaning of an operand depends only on the addressing mode chosen. For 
example, the instruction LDA will have the same operation symbols 
whether it is using immediate, zero-page, absolute, indexed or indirect 
addressing. The general pattern. of operational symbols is of the form: 


Action — Result 


The arrow denotes the direction of data transfer and is preferable to the 
= sign sometimes used. The abbreviations used for the registers are those 
already used but M is used to represent the data specified by the operand. 
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As a simple example, the instruction STA could be described as follows: 
A-—-M 


This means ‘Store a copy of the contents of the accumulator in the 
address specified by the operand’. Note that the arrow points from the 
source to the destination and only the destination contents are over-written 
by the new data; the source data is preserved. 

To take a little more complex example, the instruction ADC could be 
described concisely as follows: 


Aree A 


This means ‘Add together the present contents of the accumulator, the data 
specified by the operand, and the carry bit, then place the result in the 
accumulator’. 

The shift and rotate instructions are fearsome looking. For example, the 
instruction ASL (which is Arithmetic Shift Left) has the operational 
symbolism: 


72.0) - 0 


The bracketed expression indicates the bits within a byte numbered 0 to 7. 
The action shows that a zero enters from the right and overspill from bit 7 
goes into the carry. 


Classification of instructions 


There are many ways of classifying instructions. Appendix C simply lists 
them in alphabetical order by mnemonic group. This is useful as a quick 
reference but is by no means a scientific classification. Appendix C2 
classifies them according to the flags affected in the processor status 
register and can be quite useful. Appendix C4is an attempt to classify them 
according to ‘popularity’. It is undeniable that some instructions out of the 
56 are used a lot, some are used at times and a few are used spasmodically. 
Unfortunately, the choice of instructions to perform a given task is very 
much an individual affair. Some programmers have a particular liking fora 
certain subset. Indeed, it is often possible to recognise a friend’s handiwork 
from the listing which can be almost a fingerprint. Because of the individual 
character, Appendix C4 can be no more than the author’s personal choice 
although it might help those who are initially bewildered. 

In this chapter, the instructions will be introduced (rather than 
classified), according to need. No account will yet be taken of the various 
addressing modes under each mnemonic. 
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Finding temporary homes for data 


Due to the single accumulator in the 6502, it is often necessary to find a 
temporary home for existing data. There are several choices: 


(1) Transfer A to another register by the use of TAX or TAY and later 
restore by [XA or TYA. This is the simple and speedy solution because 
they are both single-byte instructions, taking only two clock cycles. The 
trouble is that existing data in the X and Y registers may also be valuable 
and must not be overwritten. X and Y are often totally committed for 
indexing or loop counting. 

(2) Push A to stack by using PHA and retrieve later by PLA. These are 
single-byte instructions but they take three clock cycles. It is important to 
bear in mind the LIFO (last in first out) nature of the stack. Mistakes in the 
order of retrieval could result in false data entering A. Another danger, of 
course, 1s stack overflow although this should be a comparatively rare 
event. 

(3) Store A ina memory location by use of STA and retrieve it with LDA. 
This will take three clock cycles if the location is on page zero and four on 
any other page (indexing and indirect addressing can take five or six 
cycles). 


Performing arithmetic 


There are only two direct arithmetical instructions, ADC and SBC for 
addition and subtraction respectively. The carry is always involved and, to 
avoid introducing garbage carries left over from a previous operation, it is 
important to be aware of the following rules: 


(1) Before using ADC, the carry should normally be cleared with CLC. 
(2) Before using SBC, the carry should normally be set with SEC. 


Although in some circumstances the carry can be treated as the ‘ninth bit’, it 
should be borne in mind that this is purely a way of looking at it. Obviously, 
this ninth bit is not transferred by STA, TAX or TAY. 

Addition and subtraction of single byte numbers are, of course, severely 
limited in the range of the result (255 in unsigned binary and +127 and —128 
in two’s complement binary). Fortunately, the carry bit allows double or 
multiple byte numbers to be added or subtracted because it can act as the 
continuity element between the msb of one byte and the Isb of the next. 
Thus, the carry is only cleared before the two lower order bytes are added. 
The higher order byte additions will include the carry over (if any) from any 
previous process so it would be fatal to clear the carry first. 

It is important not to forget that there are two arithmetic modes 
depending on the D flag being set or cleared. The default condition is D = 
0, which is the normal two’s complement binary arithmetic mode. It is wise, 
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though, to ensure the default condition by initialising with CLD at the head 
of a program. On the rare occasions when decimal (BCD) mode is required 
then the initialisation begins with SED, but remember this mode continues 
until cancelled again. 

Multiplication and division is possible by a tongue-in-the-cheek method 
using ASL and LSR respectively. The operations are limited to integral 
powers of two. Watch must be kept on overspill from the msb in 
multiplication and the Isb in division. 

Subject to overspill into the carry, shifting left by ASL will multiply by 
two each time so four consecutive ASL operations will multiply the 
existing data by 16. Division by two is achieved by LSR although we must 
remember that the overspill from the right (from the Isb) goes into the 
carry. As a matter of interest, the reason why LSR is named Logical Shift 
Right is due to this very reason. It is arithmetically absurd for carry status 
to be in the Isb position, hence it is deemed to be ‘logical’ shift. This is in 
contrast to ASL (Arithmetic Shift Left) where the carry action is at the msb 
end. Unless the programmer is sure, from previous knowledge of the data, 
multiplication and division by these instructions must check for the 
presence of a carry after each use. There will be exceptions, of course, such 
as when multiple-byte precision is used. In these circumstances, the carry 
will be providing continuity between the component bytes when used in 
conjunction with ROL or ROR. 


Clearing memory and registers 


There are no instructions in the 6502 which can clear any of the registers or 
memory locations to zero. The usual way to clear registers is to store zero in 
them. Toclear memory locations, a previously zeroed register can be stored 
in them. Those who are fascinated by novelty may be attracted by the 
following little snippet: 


Exclusive-oring data with itself always results in all zeros. 


For example, if A contains &9D and we write EOR #&9D, the 
accumulator result is &00. (To confirm, write out the example in binary 
form.) 


Up-counting and down-counting 


Counting is essentially an adding-by-one operation and implies ‘up- 
counting’. It is also called incrementing. Down-counting is subtracting by 
one. Itisalsocalled decrementing. The X and Y registers can be counted up 
or down by the single byte instructions INX, INY, DEX and DEY, each 
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taking only two clock cycles. Data in memory can be incremented or 
decremented by means of INC or DEC but not economically. They each 
take five to seven cycles depending on the addressing mode in use. 

The accumulator is left out in the cold, lacking an increment or 
decrement instruction. It can, of course, be done by adding or subtracting | 
which, like DEX or INX only takes two clock cycles, but it requires two 
bytes even for the immediate addressing mode. There is also the possibility 
that the carry might have to be cleared first which, if forgotten, could lead 
to a mystery bug. An alternative is some roundabout method suchas TAX 
then INX then TXA, providing of course, that X (or Y) is free. 

Counting is an essential part of loop control. The number of loop revs 
can be achieved either by starting with N and counting down to zero or 
starting with | and counting up to N. The advantage of the count down 
method is that testing for loop exit can be achieved with BNE or BPL. 
Unfortunately, it is very easy to be ‘one out’ in the count down. If we count 
up to N, an extra comparison instruction such as CPX, CPY or CMP is 
required to check the exit condition but the method may have the 
advantage of seeming more ‘natural’ and errors by one are less likely. 


Processing particular bits 


There will be times when it will be required to operate on one or more 
particular bits within a byte, rather than on the entire byte. We may wish to 
ensure, say, that bit 3 is set to | without altering the remaining bits. The 
possible operations fall into three main groups, c/earing bits to zero, setting 
bits to | and finally, changing bits. This is achieved by using one of the three 
‘logical’ instructions AND, ORA and EOR in conjunction with the 
appropriate mask word in the operand. The action is always on the 
accumulator. 


To clear selected bits: 

Use AND with an operand mask as follows: ‘l’s in the mask will leave 
corresponding bits unchanged. ‘0’s in the mask will ensure that 
corresponding bits are 0. 


To set selected bits: 

Use ORA with an operand mask as follows: ‘0’s in the mask will leave 
corresponding bits unchanged. ‘l’s in the mask will ensure that 
corresponding bits are 1. 


To change selected bits: 

Use EOR with operand mask as follows: ‘0’s in the mask will leave 
corresponding bits unchanged. ‘l’s in the mask will ensure that 
corresponding bits are changed. 


The-explanation for the above behaviour can be found in Appendix A 
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under the heading Logic Gates. However, the following examples may help 
in understanding how to work out the correct mask: 2 


(a) To ensure that bit Sin the accumulator is a0, use AND #& DF (the mask 
in binary is 1101 1111). 

(b) To ensure that bits 2 and 6 in the accumulator are ‘I’s, use ORA #&44 
(the mask in binary is 0100 0100). 

(c) To ensure that bit 3 in the accumulator is Ate use EOR #&08 (the 
mask in binary is 0000 1000). 


One’s complement of accumulator 

It is sometimes necessary to flip all the bits in a byte (i.e. produce the one’s 
complement). Assuming the data is already in the accumulator, this can be 
done by exclusive-oring as follows: 


EOR #&FF or EOR &255 


Two's complement of accumulator 

The two’s complement is obtained by adding | tothe above. Unfortunately, 
we can’t add the | by incrementing because the result is in the accumulator. 
The only way is to follow with ADC #1, making sure to clear the carry first. 
The coding is as follows: 


EOR #&FF 
a IF & 
ADC #1 


Since the two’s complement of X isO—X, analternative method 1s simply to 
subtract the number from zero. This is, by definition, the two’s complement 
but would entail storing the data first before loading the accumulator with 
0. 


Finding the state of a particular bit 

It is sometimes important, particularly in peripheral control, to find out the 
state of one particular bit within a byte. This can be done by loading the 
byte into the accumulator, erasing all bits except the one of interest, then 
testing for zero. If the result is non-zero, the bit must have been a |. For 
example, suppose we are interested in bit 3, the coding could be: 


LDA data 
AND #08 (0000 1000) 
BNE etc. 


An alternative method, which only works if bit 6 or bit 7 is involved, is the 
BIT test. For example, we can start by writing: 


BIT data (‘data’ is an arbitrary address) 
This copies bit 6and bit 7 of the data into the V and N bits respectively. This 
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can be followed by BVS or BMI as required. The BIT instruction takes 3 
clock cycles if data is on page zero but otherwise 4 cycles. As a bonus, the 
bit test also logically ANDs the data into the accumulator. If this is 
a nuisance rather than a bonus, the accumulator should be stored 
first. Because of this, use of the BIT test is nota commonly used instruction. 
Besides the three logical instructions AND, ORA and EOR, the shift and 
rotate instructions LSR, ASL, ROR and ROLare also used to play around 
with bits. LSR and ASL should be thought of as ‘open-loop’ operations 
because bits are lost if the carry is already full. In contrast, ROR and ROL 
are ‘closed-loop’ because the bit pattern circulates. They can all play an 
important role in peripheral work and some off-beat requirements. The 
shift and rotate instructions are unique in having ‘accumulator’ addressing. 
Thus, they can act on the accumulator ora memory location. If the action is 
required on the accumulator, the mnemonic must be followed by A. For 
example, to shift the accumulator right, we must write LSR A. When using 
accumulator addressing, no operand is necessary (the ‘A’ is not a true 
operand and does not consume a byte). Because of this, it should be noted 
(because it is a common mistake) that the shift and rotate instructions must 
either have an operand or an ‘A’. For example, a naked LSR 1s illegal. 


Double-byte multiplication 

This provides a useful exercise in shift and rotate operations. Although 
ASLand ROL both multiply by two, the carry can be a problem if they are 
not chosen wisely. No carry must be allowed to enter the lower order byte 
from the right so ASL is appropriate. On the other hand, the higher order 
byte must take into consideration the carry from the right so ROL must be 
used. Assuming the data is in two bytes of memory, the coding would be: 


ASL low-byte 
ROL high-byte 


Double-byte division 

The opposite is required here. Thus, the higher order byte must be 
attacked first and a carry must not be allowed to enter from the left. This 
suggests LSR as the first step. The lower order byte must receive a carry (if 
any) from the left so the correct instruction here is ROR. Assuming that the 
data is in two bytes of memory, the coding is therefore: 


LSR_ high-byte 
ROR low-byte 


Branching techniques 


The equivalent of the dreaded GOTO in BASICisJMP. Thejump toa new part 
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of the program is unconditional and, because JMP has a two-byte operand, 
can reach any part of the 64K memory map. Appendix C lists seven 
conditional branch instructions. Acommon cause of a programming bug 1s 
an incorrectly used branch test allowing an unexpected loophole. The 
following points are worth emphasising: 


(1) Branch instructions themselves have no effect on the processor status 
register. Thus, two different branch instructions can follow one another so 
the original data can be tested for two conditions. 

(2) BMI or BPL should only be used if data is represented in two’s 
complement binary. They are meaningless in unsigned binary because 
there is no differentiation into positive or negative sets. 

(3) Before using a branch, make certain that the /ast operation actually 
updates the bits you are testing. In other words, check up on Appendix C2 
which includes a classification of all instructions according to their effect 
on the processor flag bits. For example, it may be pointless to use BCC after 
DEX because only the N and Z flag bits are updated. 


The limits of + 127 byte forward or—128 bytes backwards have been covered 
elsewhere. If the branch is beyond range (which should not be often) the 
customary solution is to combine the branch with a JMP. For example, 
suppose the branch is to be BNE LOOP and the label ‘LOOP?’ is out of 
range. The conventional way out is as follows: 


BEQ SKIP 
JMP LOOP 
SKIP 


Note that the opposite test (BEQ) is used instead of BNE so the jump is 
leap-frogged to the label SKIP. 


Comparisons 


It is often required to compare two numbers in order to set the status flags 
without altering the contents of the register. There are three instructions 
which perform this task, all of which set the N, Z and C flags: 


CMP, which compares memory with the contents of the accumulator. 
CPX, which compares memory with the contents of the X register. 
CPY, which compares memory with the contents of the Y register. 


The comparisons are done by subtracting the memory data from a copy of 
the register in question. The operational symbolism is therefore A-M, 
X-—M or Y—M respectively. It is easy to get mixed up with the direction of 
the subtraction, so note carefully that the subtraction is from the register. A 
suitable branch instruction must follow a comparison (otherwise there 
would be no point in asking for the comparison). It is possible to get in 
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some funny mix-ups. The following examples may help in choosing the 
correct branch: 


(1) To check if the register is /ess than memory, follow with BCC. 

(2) To check if the register is equa/ to memory, follow with BEQ. 

(3) To check if the register is greater than memory, follow with BEQ first 
then BCS. 

(4) To check if the register is greater than or equal to memory, follow with 
BCS. 


Addressing modes 


Commencing with a definition, an addressing mode is the significance to be 
attached to the operand part of the instruction. Addressing modes 
available on the 6502 can be conveniently divided into three groups: non- 
indexed, simple indexed, and indirect indexed. Most of these modes may 
already be familiar to most readers, especially those who have read 
Discovering BBC Micro Machine Code. However, some revision or 
restatements are advisable, if only to maintain continuity during the lead- 
up to the rather nasty (nasty to grasp, that is) indirect addressing modes. 
Appendix C3 classifies instructions according to the addressing modes 
available. 


Implied addressing 

This is the simplest addressing mode in the repertoire because memory is 
not involved, neither is an operand required. They are all single byte 
instructions, conveying full information by the op-code alone. They all refer 
to internal operations on the 6502 registers. Because most of them only take 
two clock cycles, they are, or should be, the popular choice wherever 
possible. 

Instructions which allow implied addressing and consume only two 
clock cycles are: CLC, CLD, CLI, CLV, DEX, DEY, INX, INY, NOP, 
SEC,-SED, SEL TAX, TAY, TSA, TAA and TAS. 

The following take more than two clock cycles: BRK, PHA, PHP, PLA, 
PLP, RTI, and RTS. 


Immediate addressing 

Memory is not involved because the operand is the data. All instructions 
using immediate addressing consume two bytes: one for the op-code and 
one for the operand. The standard assembler prefix to denote this mode is 
the symbol (#). For example: 


LDA #32 or LDA #&20 


Both are using immediate addressing. The first example is loading the decimal 
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number 32 into the accumulator while the second example loads hex 20. Whether 
to use hex or decimal is optional but the guiding rule is to choose the more 
natural form for the purpose in use. For normal numerical work, decimal 
would be the preferred notation but for AND, EOR or ORA masks, 
hexadecimal has more meaning. Although it may seem to be stating the 
obvious, the largest numerical operand is 255 or & FF because immediate 
addressing only allows a single byte operand. Risking another obvious 
statement, the assembler would be very unhappy if we tried to load 
negative numbers in the form LDA #32. 

Immediate addressing is used for constants, particularly in conjunction 
with comparison instructions at the end of a loop as, for example, CMP 
#20. The constant must, of course, be knownto the programmer at the time 
of writing. In BASIC, we are usually extolled to avoid constants within the 
body of the program, the advice being to assign them to a variable at the 
head of the program. Such advice is not necessarily sound when applied to 
machine code because this would mean a trip to memory to obtain the data. 
The power of immediate addressing lies in the fact that memory is not 
involved: the data is immediate/y available in the instruction, providing, as 
said before, the programmer knows it at the time of writing. 

There are eleven instructions which allow immediate addressing: ADC, 
AND, CMP, CPX, CPY, EOR, LDA, LDX, LDY, ORA and SBC. 


Absolute addressing 

We should begin by sorting out some of the confusing terms used by 
different authorities. The term ‘direct’ addressing is often used loosely 
when the operand refers to the address of data, rather than the data itself. 
Thus, the instruction LDA &0034 is an example of ‘direct’ addressing (note 
there is no ‘# prefix). The instruction causes the contents of address &0034 
to be placed in the accumulator. However, bearing in mind that 6502 has.a 
64K memory map, it will be evident that addresses between &0000 and 
&OOFF would result in an inefficient use of memory space ifthe full four-hex 
digit address were mandatory. Since the data bus is only eight bits wide, the 
microprocessor would need to make two trips down the address and up the 
data bus to collect the full operand. The first two leading zeros are useless 
passengers. 

To improve the efficiency, the address space is broken down into two 
domains. As mentioned in an earlier chapter, addresses within the range 
&0000 to &00FF are designated the page zero domain, to distinguish them 
from all other addresses &0100 to &FFFF. With regard to the terms used, 
the Motorola 6800 (the ancestor of the 6502) used the term ‘direct’ 
addressing instead of zero-page addressing and ‘extended’ addressing to 
cover the rest. Many machine code programmers, brought up on the 6800, 
had to readjust to the change in terminology. Returning to the 6502, the 
term ‘absolute’ addressing is applied to addresses anywhere in the 64K 
memory map. In other words, absolute addressing requires four hex digits, 
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while zero-page addressing only requires two. Instructions using absolute 
addressing require three bytes, one for the op-code and two for the 
operand. 

There are 21 instructions which allow absolute addressing. These are: 
ADC, AND, ASL, BIT, CMP, CPX, CPY, DEC, EOR, INC, JMP, JSR, 
LDA, LDX, LDY, LSR, ORA, ROL, STA, STX and STY. 


Zero-page addressing 
The concept of zero-page (sometimes called page-zero) is so important that 
it justifies emphasising the boundaries once again. 


Zero-page is the address range &00 to & FF or 0 to 255 


There are reasons why this page deserves special treatment. There are 
obvious speed advantages, due to the single byte operand. This also leads to 
a saving in program memory space. Another reason is that the more 
complex addressing modes (to be dealt with later) require address pointers 
which must be in zero-page. Perhaps the most disappointing aspect is the 
scarcity of available space. The operating system, not surprisingly, 
occupies the vast majority of zero-page. In fact there are only thirty-two 
locations guaranteed free under all circumstances in the BBC machine: 


Free space in zero-page is between &70 and &8F inclusive 


Because of the restricted space, it is essential, before planning any 
ambitious machine code systems, to choose zero-page locations with care. 
The apparent speed advantage is not, in itself, sufficient to justify 
squandering locations. In fact, it is sound philosophy to treat zero-page 
locations in the same light as registers — as valuable and scarce 
commodities. A good rule is to use zero-page for the most frequently used 
variable data. Sometimes, it may be wise to use zero-page for data withina 
loop, even if it means temporarily transferring it from an absolute address 
and then back again. The advantage of this approach may be appreciated 
more readily if we examine a few figures. Suppose a variable data item, 
located in an absolute address, is in the middle of a long loop which 
revolves 10000 times. Suppose we then transfer it temporarily to zero-page 
before entering the loop by using: 


LDA &xxxx (absolute, 4 clock cycles) 
STA &xx _ (zero-page, 3 clock cycles) 
After the loop ends, the status quo can be regained with: 
LDA &xx (zero-page, 3 clock cycles) 
STA &xxxx (absolute, 4 clock cycles) 


The four extra instructions for the complete transfer have taken a total of 
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14 clock cycles and consumed an extra 10 bytes of programming space. The 
saving within the loop, however, would be | cycle per rev, leading to a total 
saving of 10000—-14 = 9986 clock cycles. We shall see later that indirect 
address pointers in zero-page will take two bytes each and many of these 
may be required in a program of even moderate complexity. 


Relative addressing 

The intimate details of relativeaddressing are only of vital importance if the only 
method of entry is via a machine code monitor. Since the BBC machine has the 
advantage of assembler input, it is not necessary to spend quite so much 
time on the subject. It would not be wise, however, to skip the subject 
altogether. If we did, the hex columns of the assembler output could often 
look mysterious. 

Relative addressing is only used with branch instructions. In fact, 
forgetting the assembler for a moment, relative addressing is the only 
method possible in branch instructions. Using hex machine code as an 
example, 


BEQ &04 


The literal meaning is ‘If equal to zero, branch 4 bytes forward’. The term 
‘relative’ refers to the program counter. If the branch conditions are 
satisfied, the program counter (which always contains the address of the 
next program byte) has 04 added to it. This causes the next byte executed to 
be 04 bytes ahead, relative to the previous position. In other words, the 
operand indicates the number of bytes to be skipped. To branch backward, 
the two’s complement is required (see Appendix A) so, to branch 04 bytes 
back, the instruction would be BEQ &FC. Clearly, the calculation of the 
correct operand is an error-prone exercise. The assembler takes all the 
drudgery out of relative addressing by allowing the operand to be a label 
instead of a relative address. We can use, 


BEQ Loop 


This works, subject to the proviso that the line, to which we wish to 
branch, is prefixed with the ‘Loop’ label (naturally, the choice of label is 
arbitrary). The assembler is hiding from us the fact that relative addressing 
is being used. Instead, it appears as a simple ‘branch to label’ operation 
which is far less error-prone than grappling with relative addressing. 

As for the timings of relative addressing, these will depend on whether or 
not the branch is taken. If taken, a branch takes 3 clock cycles or, if acrossa 
page boundary, 4 clock cycles. If it is not taken, the branch takes 2 clock 
cycles. 

The extra cycle when a page boundary is crossed is due to the alteration 
to the high- as well as low-byte of the addresses. If speed is very critical, a 
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programmer should watch the hexadecimal assembly listing closely to see if 
a page boundary is crossed. For example, suppose the program counter 
was showing &OS5FC prior to a branch. If the relative branch is &04 ahead, 
the new program counter reading would be &0600, therefore there has been 
a boundary crossing between page 5 to page 6 which consumes an extra 
clock cycle. If such a branch was in the middle of a loop which revolves N 
times, it would be sensible to manipulate the coding, or alternatively 
relocate, so that the branch range was limited to the same page, and saving 
N clock cycles. It is surprising how attention to such small details can result 
in a material gain in execution speed. Although terribly wasteful in terms of 
memory, it is better to cut loops out altogether and resort to straight-in-line 
coding if speed is absolutely vital. In most cases, this will be little more than 
an idealistic solution. 


Indexed addressing 

Although briefly discussed elsewhere, the concept of indexed addressing 
deserves detailed treatment. As far as the BBC assembler is concerned, the 
indexing mode is denoted by a comma following the operand, followed in 
turn by X or Y. For example: 


LDA &2356,X or LDA &75,Y 


Both are examples of indexed addressing but the first is using an absolute 
address and the second is using a zero-page address. The contents of the X 
(or Y) register is automatically added to the operand address before the 
instruction operates on the resultant address. Itis well to recap on the terms 
used in indexed addressing: 


(a) The base address is the address as stated in the operand. 
(b) The relative address is the contents of index register (X or Y). 
(c) The absolute address is the sum of the base and relative addresses. 


As an example, assume that X contains 3 and that the instruction LDA 
&34,X is written. The base address is 34, the re/ative address is 3 and the 
absolute address is 34+3=37. Alternatively, the term ‘effective’ is often 
used in place of ‘absolute’. 

Two forms of indexed addressing are recognised: 


(1) Absolute indexed, when the operand is any address in the 64K 
memory map. The instructions allowing X as the index register are ADC, 
AND, ASL, CMP, DEC, EOR, INC, LDA, LDY, LSR, ORA, ROL, 
SBC, and STA. The Y index register can be used in ADC, AND, CMP, 
EOR, LDA, LDX, ORA, SBC and STA. 

(2) Zero-page indexed, when the operand is on page-zero. The instructions 
which allow X as the index register are ADC, AND, ASL, CMP, DEC, 
EOR, INC, LDA, LDY, LSR, ORA, ROL, SBC, STA and STY. There 
are only two instructions which allow the Y register for indexing. They 
are LDX and STX. 
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A mysterious bug can occur when using zero-page indexed addressing if 
the contents of X plus the operand address come to more than 255 or &FF. 
Clearly the single byte operand cannot hold numbers of this value so a 
wrap-around takes place. For example, if the instruction is LDA &FE and 
X contains 2, the arithmetical sum would be &100. The wrap-around 
action, however, will mean that the first hex digit is dropped and the 
absolute address will be &00 instead of &100. 

Indexing allows any item in a block of data to be addressed by suitable 
adjustment of the index register. The operand of an indexed instruction 
(the base address) can be the address of the first item in the block or the last, 
depending on convenience or the programmer’s whim. For example, if the 
base address is to be the start of the block, the index register can be 
incremented (by INX) within the loop until the last item is reached. On the 
other hand, it may be more convenient to choose the end of the data as the 
base address, in which case the index register is decremented (by DEX) until 
the first item is reached. Decrementation of the index register towards zero is 
generally recognised to be the more efficient method because the end-of- 
loop test can be carried out by a simple branch, such as BNE. The 
incrementation method demands a comparison (CPX or CPY) before the 
branch test. However, program legibility is sometimes more important 
than speed. There is a natural inclination to count up towards a finite limit 
rather than to count down towards zero and there is less chance of being | 
out in the count. 

Besides accessing a data block sequentially, indexing is useful for /ook- 
up tables. For example, imagine a table of sines (or other mathematical 
functions) between, say, 0 and 89 degrees to be stored in a data block and 
the base address is where sin(zero) is located. The table can be accessed by 


LDA base, X 


If the required angle is in X, the sine of the angle will be in the 
accumulator. The limitation of 8 bits for each sine will only give an 
accuracy to about two decimal places unless multi-byte working is used. 
Also, the programmer must take account of the decimal point when 
interpreting the result. Obviously, it would be absurd to use this method in 
place of the resident BASIC trig functions unless high speed access is vital. 

Indexing is really address modification made easy. Besides being 
interesting, it is worth examining an alternative method (which was, 
historically, used before index registers were thought of ) involving direct 
modification of the operand. This consists of loading the operand of an 
instruction into the accumulator (or other register), changing its value and 
then returning it to the previous location. To see how this works, consider 
the following line: 


STA blogs 


The operand has an arbitrary symbolic address. If this were in a loop and 
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we wished to store the next item in blogs+ | without using indexing, it could 
be achieved as follows: 


.Modify STA blogs 
INC Modify+ 1 


Note that the original line has now been given an arbitrary label ‘Modify’ 
which is where the op-code STA is stored, so blogs must be located in the 
next address, modify+ 1. The next lineincrements the contents of blogs+ | so 
we have achieved ‘address modification’ by a roundabout method. If the 
change is to be more than just a simple increment - say, adding 7 - the 
coding could be as follows: 


.Modify STA _ blogs 
LDA Modify+1 
ADC #7 
STA Modify+1 


Such direct alteration of an operand by the program itself is sometimes 
useful, but it is not a practice to be recommended. Listings of machine code 
are never easy to follow and these sorts of tricks can only add to the general 
confusion. It is worth emphasising that the primary function of an index 
register lies in the ability to alter the effect of an operand and without 
altering the operand itself. One disadvantage of the 6502, which soon 
becomes evident in the early stages of programming, is the limit of 8 bits. 
This, of course, restricts the range of addresses which can be scanned by 
indexing even when absolute indexing is used. 


Indirect addressing 

Mastering any subject consists of systematically overcoming the various 
intellectual hurdles which appear during a course of study. Student drop- 
outs may occur when a hurdle is reached which is just too high. In machine 
code programming, there are many hurdles to overcome but the one which 
is responsible for the greatest student drop-out ratio is the concept of 
indirect addressing. Indexing 1s relatively easy to grasp once the advantages 
of address modification are realised but the following definition may help 
in understanding why difficulties arise in indirect addressing. 


An indirect address is the address of an address. 


In assembly language, indirect addressing is indicated by enclosing the 
operand in parentheses as follows: 


LDA (operand) 


Note that although the operand is indeed an address, it is where the 
computer must go to find the address of the data. We shall continue for the 
moment to use LDA in examples, but it should be mentioned that simple 
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indirect addressing as described above is only available with one 
instruction, JMP. Providing this is borne in mind, there is no harm in 
continuing with LDA in the initial stages. Consider the instruction: 


LDA (&70) : 


Because of the parentheses, &70 is an indirect address, referring the 
computer to go to a double-byte address &70 (low-byte) and &71 (high- 
byte). This double-byte address is known as the address pointer because it 
‘points’ to where the required data is located. Continuing with the 
examples, suppose that address &70 contains &35 and address &71 
contains &0D. Returning now to the original instruction, LDA (&70), it 
should now be apparent that the contents of address &0D35 will be loaded 
into the accumulator. We will further assume that &0D35 contains &56 
Let us recap, using this example to illustrate the terms once more: 


The instruction was LDA (&70). 

The indirect address is &70. 

The address pointer is &0D35. 

The data pointed to and finally loaded into the accumulator is &56. 


Figure 3.1 may help in the understanding of the above example. 


Accumulator result 


LDA (& 70) 





Fig. 3.1. Data flow in indirect addressing. 


When first introduced to the idea of indirect addressing, it is difficult to 
grasp the use of it. It appears to be a complicated and tortuous path to 
follow, merely to place data in the accumulator. For instance, it is 
understandable, and pertinent, to ask why the line in the above example 
couldn’t have been written in the simpler absolute addressing form: 


LDA &0D35 


After all, it may be argued, both forms would have identical effects. They 
would both load the same item of data into the accumulator, but the second 
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form would not be wasting a valuable location (&70) in zero-page and 
would certainly be quicker to execute. The answer to this lies in the ability 
of indirect addressing to alter the effect of an operand without altering the 
operand itself. You will remember that this quality was the fundamental 
justification for the use of indexed addressing. If the address pointer is 
changed in any instruction using indirect addressing, the effect of the 
instruction acts on a different location. This has far-reaching advantages, 
particularly when writing general purpose machine code subroutines. 
Clearly, when writing a subroutine intended to act on a block of data, it 
would be restrictive to force the writer of the program using the subroutine 
to always place the data in a fixed memory block. However, with indirect 
addressing, all that is necessary is for the main program to know where the 
address pointer is (&70 and &71 in our previous example) and load it with 
the starting address of the data block. This flexibility means that the writer 
of the machine code subroutine need have no knowledge of the where- 
abouts of the eventual data block. 

Before proceeding further, it should be remembered that the descriptions 
so far have been simplified by assuming that a 6502 has the instruction 
LDA (operand). Apart from the single instruction, JMP, simple indirect 
addressing is not supported. Instead, we have the added benefit (and 
unfortunately, the added complication) of indirect addressing combined 
with indexing. In fact, there are two forms to choose from, called ‘indirect 
indexed’ and ‘indexed indirect’. 


Indirect indexed addressing 
This is the form most often required. Only the Y index register is allowed in 
this mode. The assembler form 1s: 


LDA (operand), Y 


The operand is single byte and therefore can only refer to a zero-page 
address. 

The only difference between this mode and simple indirect addressing is 
the addition of the Y register contents to the address pointer. That is to say, 
the operand still defines where a double-byte pointer is located but the 
pointer is modified by the addition of the Y register contents. As an 
example, assume that the following line is written: 


LDA (&70), Y 


Also assume that the contents of address &70 contains &35, address &71 
contains &0D and the Y register contains &02. The effective address 
pointer will be &0D35 + &02 = &0D37. The effect of the instruction is 
therefore to load the contents of address &0D37 into the accumulator. The 
example figures can be used to define a few more terms connected with 
indirect indexing: 
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The instruction was LDA (&70), Y. 

The base address pointer was &0D35S. 
The relative offset in Y was &02. 

The effective address pointer was &0D37. 


Figure 3.2 illustrates the example. 


Accumulator result 


Y register 


LDA (& 70), Y 





Fig. 3.2. Data flow in indirect indexed addressing. 


Indirect indexed addressing allows the effect of the operand to be altered 
in either of two ways, by changing the base address pointer, by altering the 
contents of the Y register or both. The index register should be looked upon 
as an optional extra because there is no need to use it actively. Forexample, 
if Y is reset to &00, the instruction, 


LDA (&70), Y 


has the same effect as the simple (but fictitious) indirect addressing 
example given earlier: 


LDA (&70) 


However, an obvious use of indirect indexing lies in sequencing through 
a block of data items by incrementing or decrementing the Y register. It is 
helpful to distinguish simple indexed loops from indirect indexed loops by 
considering under what circumstances they would be used: 


(a) Use simple indexing if the base address is known and constant. 
(b) Use indirect indexing if the base address is not known at the time of 
writing or 1s liable to require changing. 


One advantage of indirect addressing not yet mentioned is the ability to 
reach any part of the 64K memory map by use of a single-byte operand. 
This is because the address pointer in zero-page is double-byte (16 bits). 

The following example is outline coding to perform a process ona block 
of memory with just sufficient detail to illustrate indirect indexed 
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addressing. Assume that the address of the first data item has been prior 
assigned to the address pointer in &70 (low-byte) and &71 (high-byte) and 
the length of the block minus | is 20. 


LDY #20 
data LDA (&/70),Y © 


process 


ee g j 
BPL data 


rest of program 


The example should require little explanation, except perhaps to note 
that the indexing proceeds downwards towards zero, so the processing 
begins with the last data item and finishes with the first. As mentioned 
earlier, a downwards scan enables the end of the loop to be tested without 
the use of a CPY. 

Some variations in the jargon exist. The alternative name for indirect 
indexed (and in some ways more informative) is ‘post-indexed’ indirect 
addressing because the indexing is done after the indirect address has been 
found. Also, address pointers are sometimes called address vectors. 

Indirect indexed addressing is available with ADC, AND, CMP, EOR, 
LDA, ORA, SBC, and STA. They all take 5 clock cycles except STA which 
takes 6. If a page boundary is crossed, they take an extra clock cycle. 


Indexed indirect addressing 
This mode doesn’t enjoy quite the same measure of popularity as indirect 
indexed. The assembler form 1s: 


LDA (operand, X) 


Note carefully the position of the parentheses, that X is inside instead of 
outside and only X is allowed for indexing. As before, the operand must be 
single-byte so can only refer to a zero-page address. 

X is shown within parentheses to emphasise the manner in which 
indexing is carried out. The behaviour of indexed indirect addressing 1s as 
follows: 


The address of the pointer in indexed indirect addressing is the sum of 


the operand and the contents of X. 





This definition may explain why an alternative name of this mode is ‘pre- 
indexed’ indirect addressing. To aid understanding, first study the 
following numerical example: 
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LDA (&70,X) 


In the first instance, assume that X 1s zero. The pointer is then the double 
byte address which happens to be in &70 (low-byte) and &71 (high-byte). 
However, if we assume that X contains &02, the address pointer is located 
at the double-byte address &72 and &73. Proceeding with this example, 
suppose that &72 contains &35 and &73 contains &0D, the instruction 
would load the accumulator with the contents of address &0D35. The 
example is illustrated in Fig. 3.3. 


Accumulator result 


LDA (& 70, X) 


X register 


Fig. 3.3. Data flow in indexed indirect addressing. 


Until familiarity is gained, it is easy to get mixed up with the two indirect 
modes because of the relatively superficial differences in the assembler 
form. In order to emphasise the difference in form and effect, it is worth 
viewing the two side by side: 


Indirect indexed (post-indexed indirect ) addressing keeps the pointer 


at a constant location but uses Y indexing to modify the pointer 
value. 








Indexed indirect (pre-indexed indirect) addressing uses X indexing to 
modify the operand, and hence, the location of the address pointer. 


As hinted earlier, indexed indirect addressing is not a commonly used 
mode. One area in which it is valuable is in handling peripheral interrupts. 
The course of a program can often depend on the particular peripheral 
which has requested interrupt. For example, the data sent to a printer will 
originate from a different area than the data sent to a digital-to-analogue 
converter. Assuming there are two peripherals on line, then we can arrange 
to have two separate address pointers to service them, located in zero-page. 
Suppose these double-byte addresses occupy the four locations &72, &73 
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and &74, &75 and consider the following line: 
STA (&70,X) 


The value placed in X must be that which modifies the operand to locate 
the desired address pointer. Care should be taken when calculating the 
value of X. The indirect address pointer is a two-byte address, so X must be 
changed by two at a time, otherwise the instruction above will define the 
high-byte instead of the low-byte. For example, if X is initially zero, the 
address pointer selected is located at &72, &73. If X is incremented only 
once, there is a foul-up because the address pointer is taken to be &73, &74 
which 1s the high-byte of the first pointer and the low-byte of the second. 

Apart from handling peripherals, indexed indirect addressing can be 
used to simulate the CASE statement found in some of the structured 
languages or the ON GOTOin BASIC. Control can be switched to separate 
machine code processes, each switched by a unique address pointer. The 
value in X determines which process is activated. 

Indexed indirect addressing is available with ADC, AND, CMP, EOR, 
LDA, ORA, SBC and STA. 


Summary 


1. A machine code instruction always has an op-code but not all have 
operands. 

2. The op-code defines the required action; the operand indicates where 
data is to be found. 

3. Addressing modes are various ways in which operands express 
location of data. 

4. The computer recognises only binary op-codes expressed as two hex 
digits but the resident assembler allows three-letter mnemonic groups. 

5. The precise effect of an instruction is more concise if written in opera- 
tional symbolism rather than words. 

6. During transfers, source data remains intact but old data at the 
destination is overwritten. 

7. In normal use, the carry is cleared before adding but set before 
subtracting. 

8. In double or multiple byte arithmetic, clear carry only before adding 
the lowest order bytes and set carry only before subtracting the lowest 
order bytes. 

9. Memory or registers are cleared by a load zero. There are no CLR 
instructions. 

10. There are no instructions to increment or decrement A. 

11. Use AND to clear, ORA to set and EOR to change selected bits within 
a byte. 

12. To flip over all bits, exclusive—or with & FF. 


13. 
14. 


15. 


16. 


7. 
18. 


19. 
20. 
Zi. 
pape 
23. 
24. 
ZS. 


26. 
Zi. 


28. 


29. 


30. 


Sl. 


32. 


3. 


34. 


35. 


36. 


37. 
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To produce two’s complement, flip first and then add I. 

To find the state of a single bit, mask out uninteresting bits using AND 
and test for zero. 

The BIT test copies bit 6 and 7 of the data into V and N bits respectively 
and ANDs the data into A. 

LSR has the carry bit at the Isb end; ASR has the carry bit at the 
msb end. | 

Only shift and rotate instructions have accumulator-addressing. 

In double-byte multiplication, use ASL for low-order and ROL for 
high-order byte. 

In double-byte division, use LSR first for the high-order then ROR 
for the low-order byte. 

The current state of the process status register determines whether or 
not a branch takes piace. 

Branch instructions themselves do not affect the process status register. 
BMI and BPL are only useful if two’s complement binary is used. 
If the branch is out of range, combine with JMP. 

In comparisons (CMP, CPX or CPY), the data is subtracted from the 
register in order to set flags but the original contents are restored. 
To check if the register is less, use BCC; to check if equal use BEQ; to 
check if greater, use BEQ first then BCS. 

Implied addressing has no operand. 

Immediate addressing is when the operand, which must be single byte, 
is the data. 

Absolute addressing is when the operand, which must be double-byte, 
is the address of the data. 

Zero-page addressing is when the operand, which must be single byte, 
is the page-zero address of the data. 

There are only 32 addresses guaranteed left free by the operating 
system, &70 to &8F inclusive. 

Relative addressing, used only with branch instructions, is when the 
operand signifies how many bytes away is the next instruction. 
Two’s complement arithmetic is used to cover forward and backward 
branches. With the assembler, branch-to-label is possible. 

Absolute indexed addressing is when the operand (which must be 
double byte) plus the index register, is the address of the data. 
Zero-page indexed addressing is when the operand (which must be 
single byte) plus the index register, is the address of the data. 

In an indexed instruction, the operand defines the base address, the 
index register the relative address. The sum of the two is the absolute 
or ‘effective’ address. 

The operand in simple indirect addressing is the address of the lower 
order byte of a two-byte address pointer. Only JMP offers this mode. 
JMP excepted, address pointers can only reside in zero-page (page- 
zero). 
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38. 


39. 


40. 


Advanced Machine Code Techniques for The BBC Micro 


Indirectly indexed addressing modifies the address pointer by the 
addition of Y. The assembler operand format is (operand), Y. 
Indexed indirect addressing modifies the address of the address 
pointer by the addition of the X register. The operand assembler 
format is (operand,X). 

Address pointers are also called vectors. 


Self test 

3.1. Using three lines, multiply data in the accumulator by 3. 

3.2 Write the instruction to clear bit 5 in the accumulator. 

3.3. Write the instruction to change bits 3 and 6 in the accumulator. 

3.4 Write the instruction to set bit 2 in the accumulator. 

3.5 If the accumulator initially contains 17, what will it contain (in hex) 
after EOR #& FF? 

3.6 What is wrong with LDA #&23DF? 

3.7. Which 6502 register is involved in relative addressing? 

3.8 In the BBC machine, where are the 32 free locations in page-zero 
(answer in decimal address range)? 

3.9 Ifthe effective address in the instruction EOR &73,X is &84, what is 
the relative address in hex? 

3.10 Name the one instruction in the 6502 which offers non-indexed in- 
direct addressing. 

3.11 Which index register is allowed in indexed indirect addressing? 

3.12 What addressing mode is being used in STA (&75), Y? 

3.13 In the instruction ADC (&73),Y where is the high-byte of the address 
pointer located if Y contains 6? 

3.14 In the instruction ADC (&73,X) where is the low-byte of the 
address pointer located if X contains 6? 

3.15 What is an alternative name for indexed indirect addressing? 


Chapter Four 


Handling the Resident 
Assembler 





The assembler format 


An assembler is essentially a piece of software which aids the writing of 
machine code. Most personal computers offer only the crudest facilities for 
using machine code. Some only have a ‘machine code monitor’. Others are 
even less equipped and the only way to enter a machine code program Is by 
means of a boring and error-prone series of POKEs. It is not surprising that 
few owners of such machines develop a strong attraction for machine code. 
Of course, assemblers for most popular microcomputers are available on 
tape or disk but it is a sad fact of life that external software which has to be 
loaded, particularly from tape, is often too much trouble. Initially, it may 
be used with enthusiasm but the inevitable ‘tape-inertia’ syndrome 
eventually relegates the tape to its coffin. 

Assemblers vary in sophistication and the facilities offered for 
debugging. It is unusual for a personal computer to be equipped with a 
resident assembler. No doubt manufacture’s of machines for this market 
have previously assumed that few buyers, other than the completely 
dedicated, would be interested in any programming language other than 
BASIC. Acorn, conscious that interest in machine code would grow, 
included a resident assembler in the Atom, a practice which they have 
continued in the BBC machine. 

The assembler used is unique because it is ‘wedged’ inside the BASIC 
chip. The position of the assembler inside the language system ensures easy 
transition between BASIC and machine coding. Machine code splices 
withina BASIC program are recognised by the /arge square brackets| and]. 
(These appear on the screen as left- and right-pointing arrows in Mode 7 
on the BBC machine.) Although the assembler lacks many of the 
refinements found in traditional mainframe-oriented software, it is 
perfectly adequate and quite easy to use. In fact, some features of its design 
could be considered superior to classical assemblers. Although many 
readers will already be familiar with the facilities offered, it is necessary, for 
the sake of continuity, to devote a little space to the following overview. 
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Mnemonic op-codes 

Bearing in mind the function of an assembler defined at the beginning of 
this chapter, the foremost requirement of any assembler is the substitution 
of meaningful letter groups for the instruction op-codes. Thus, to transfer 
X to A, the hex machine code is &8A (see Appendix C1). The assembler 
allows us to write TXA instead. All mnemonic op-codes are three-letter 
groups. 


Numerical operands 
Numerical values in operands are assumed to be decimal. If the number is 
to be interpreted as hex, they must be prefixed with the & character. 


Operand variables 
The most remarkable and useful property of the assembler is the lattitude 
allowed with variables. The following deserves emphasis: 






_ Any legitimate BASIC variable or expression can be used in the | 
operand. | F aay If 





Examples: 


i100 
110 
1270 
10 
140 


130 


LDA 
LDA 
LDA 
LDA 
LDA 
LDA 


LOCK 

LOCK+1 
#ASCC"Y") 
#BASE DIV 256 
#BASE MOD 256 
#SOR (4) 


These are all expressions which the assembler will accept although the 
following common sense provisos apply: 


(1) The operand expression must be capable of intelligent decoding. That is 
to say, the resultant must be an address or data acceptable as an operand. 
(2) Registers (A, in the above examples) can only hold one byte so it will be 
up to the programmer to ensure that the data, represented by the variables, 
is within this limit. 

(3) It is essential that the variables be pre-defined in the BASIC program 
area. For example, we cannot write A=30 inside an area enclosed by the 
square brackets [.....]. 

(4) Although BASIC expressions can be used, BASIC commands are most 
definitely illegal. The assembler would ruthlessly reject lines like PRINT A 
or DEF PROCsort. BASIC commands or statements must be confined to 
the BASIC area. | 


BASIC variables and expressions can also be used in jump operands 
such as: 


JMP START+7 
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This will cause a jump to the address found by adding 7 to the contents of 
the variable START. | 


Branch labels 

The previous chapter, dealing with branch instructions and relative 
addressing, stressed the difficulties associated with counting the number of 
bytes forward or backwards to reach the desired branch destination. The 
assembler, to a large extent, offers relief by allowing the use of /abels. Thus 
the assembler disguises the inherent relative addressing and substitutes a 
straightforward ‘branch to label’. For example, we can write: 


BNE Sort 


Here ‘Sort’ is the branch operand, referring to a label appearing somewhere 
in the program. 

The label is recognised as such by the assembler because it must begin 
with a full-stop as shown in the following example: 


CLC 

- BACK LDA Number i 
ADC Number 2 

BE@ Finish 

ROR A 

BNE BACK 

«Finish RTS 


The coding obviously has no practical value so it would be pointless to 
key in. Note the full-stop before the two labels and note also that there must 
be no full-stop when the label is an operand directive. The layout is valid 
but considerable latitude is allowed. For example, it could be re-written as 
follows: 


CLC 

- BACK 

LDA Number 1 

ADC Number 2 

BEQ Finish 

ROR A 

BNE BACK 

«Finish RTS 

Notice that the label can stand alone or be on the same line as the 

instruction, providing there is at least one space separation. The label, 
although existing within the assembled coding, can also be considered to be 
a normal BASIC variable. It is, after all, an address andtherefore a simple 
numeric. We could, for instance, discover this address (when out of the 
assembler and back in BASIC) by writing PRINT BACK or PRINT 
Finish. A problem arises, which we shall deal with later, when 
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assembling coding in which branches are made to forward positions. 


Remarks 
Remarks, which programmers feel are necessary to explain coding, must be 
distinguished by either the semicolon or the back slash (\). For example: 


LDA (pointer, Y) \Indirect loading 


We shall use the back slash because it seems a more meaingful symbol and 
also because the semicolon is strongly associated with the PRINT 
statement in BASIC. Remember that the back slash in the BBC machine 
Mode 7 appears on the screen as a funny ‘!/” character. 


Multi-statements 
As in BASIC, more than one assembly instruction can be placed on one line 
by using the full colon as a separator. For example: 


[ 
CLC:ADC Number:ADC #4:RTS 
J 


It must be appreciated that the square brackets, which enclose assembly 
coding, have the status of an instruction, even though they are categorised 
as ‘pseudo instructions’. 

Writing many statements per line is popular in BASIC because programs 
appear shorter in length and less bytes are squandered in storing line 
numbers. In assembly code, although the length of the program still 
appears less, there is no real saving because the action of assembly produces 
the same final object program. Another disadvantage of multi-statements 
per line is the decrease in readability. A program written in assembly code is 
not exactly an easy thing to decipher (even for the writer!) and cramminga 
lot on one line increases the confusion, besides leaving less room for 
remarks. 


Storing assembly code 

When writing a BASIC program, we have no worries about where it will be 
stored in memory. It is left entirely to the operating system, which stores it 
in accordance with built-in rules. However, when a piece of assembly code 
is written, the assembler has no such built-in authority. It is up to the 
programmer to provide guidance on the desired starting location of the 
program in memory. Naturally, the guidance given must steer clear of the 
RAM space already earmarked by the operating system. There are several 
reasonably free areas in the BBC machine, besides space madeartificially by 
shifting the BASIC text and variable areas around. Before discussing the 
details of free spaces, it is important to stress the vital importance of one of 
the special resident integer variables P%. As readers are already aware, the 
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complete set of these twenty-seven special variables are labelled @%,A% 
... Z%. They all occupy fixed locations and can be picked up at any time by 
the operating system or assembler. Returning to P%, 


he contents of P% are taken as the address of the next item of © 





Thus, to locate a program anywhere in memory, it is simply a case of 
loading the starting address of the first byte into P%. After each byte is 
stored, P% is automatically incremented by one in readiness for the next 
byte. In fact, it is convenient to relate P% to the microprocessor program 
counter. 

The setting of P% must be carried out in the BASIC part of the 
composite program. For example: 


10 PAZ=%0D00 


2O [ 

20 \ 

40 \ 

vO \Assembly coding 
60 \ 

7O \ 

80 J 


90 REM rest of BASIC 


This would store the first byte of assembly code (wedged between BASIC) 
in the hex address 0D00. It is possible to use variable names when loading 
P% so we could have written the top bit of the previous example as: 


10 STARTZ=&0D00 
20 A PA=STARTA 


Instead of committing P% to a fixed machine address, it is possible to 
delegate some of the responsibility to the operating system. Page 237 of the 
BBC User Guide tells us that the Dimension statement can be twisted a 
little in order to accommodate byte arrays. For example: 


DIM START% 99 
PZ=STARTZ 






This will reserve 100 bytes for assembly code, the first byte located in 
START% (START% is, of course an arbitrary name). It is important to 
recognise the unusual character of the DIM statement in line 100. There are 
indeed two variations from the normal BASIC statement for dimensioning 
arrays. The first point to notice is the absence of the brackets around the 99 
although the space before 99 is mandatory. The second point, less vital but 
still useful to know, isthat START%should not be thought of as the ‘name of 
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the array’. It is merely the labelled address of where the first byte is stored. 
The storage area, like the normal DIM statement, is ‘dynamic’. That is to 
say, it moves up or down depending on the number of lines in the BASIC 
area. However, the actual machine address of the first byte in the example 
above can always be ascertained by a command, such as PRINT 
START%. The snag in using the DIM statement is the fact that you have to 
make an intelligent guess as to the number of bytes in your coding. The best 
way is to make a preliminary estimate and then add, say, twenty more for 
luck because it doesn’t matter if you over-estimate. However, if you area 
stickler for having things dead right, it is easy to count the bytes after the 
final debugging and alter the DIM number accordingly. 

Other ways of finding space for assembly code are by the use of the 
pseudo variables TOP and PAGE. Space (for example 256 bytes) can be 
found below the BASIC program by using PAGE in the following manner: 


100 PAGE=PAGE+256 
110 PZ=PAGE-256 


The first line pushes PAGE upwards, and therefore the start of the BASIC 
program. This will reserve 256 bytes for the assembly coding which, as line 
110 suggests, starts at the old PAGE position. If we wish to reserve some of 
the assembly area for, say, 20 data bytes, line 110 can be written as : 


110 PA=PAGE-236 


This leaves 20 bytes for data, the first item being in PAGE and the last in 
PAGE+19. The assembly program will follow, beginning at PAGE 
“FP ae. 

If we use LOMEM., the program and/or data can be positioned between 
TOP (the top of the BASIC program) and LOMEM (where BASIC stores 
its variables). (See the memory map on page 501 of the BBC User Guide.) 
For example: 


100 LOMEM=LOMEN+256 
110 PA=TOP 


This will reserve 256 bytes for assembly coding, the first byte beingin TOP. 
The User Guide warns us not to attempt to alter LOMEM in the middle of a 
program or the interpreter will lose track of the variables. 


Fixed locations which are potentially free 
There is nothing wrong with storing assembly programs and data 
‘dynamically’ as described above. However, storing in a fixed location has, 
at least, a psychological advantage. You always know exactly where the 
program is stored and this can be comforting. The trouble with fixed 
storage in the BBC Micro is its scarcity. 

The: BBC machine has plenty of fixed free space providing that some of 
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the normal and expansion facilities are sacrificed. For example, page &0D 
is perfectly safe and is actually described in the User Guide as space for 
‘user-supplied resident routines’. Unfortunately, this is subject to the 
proviso that no disk interface is used. It is also possible to use page & CO if 
we are prepared to sacrifice user-defined character definitions. Page & BO is 
yet another page available but this time we will lose the facility of 
programming those delightful red function keys. Use of any of these pages 
will therefore depend on individual needs, but it is useful to know that three 
contiguous pages are potentially free, representing a total memory range 
&0BO00 to &O0DFF. This is a hefty 768 bytes which, in machine code terms, is 
capable of supporting a complex program. 

However, for those with disk or other expansion options it is advisable to 
steer clear of fixed locations altogether. It is safer to use the DIM 
statement. Any subsequent programs given which use fixed locations (to 
simplify explanations) can always be modified to DIM methods. 


Operating the assembler code 


To execute a BASIC program, we use the keyword RUN. To execute a 
machine code program we use the keyword CALL. Both RUN and CALL 
are BASIC keywords, a fact which serves to emphasise the close 
relationship between the assembler and the BASIC interpreter. The simple 
word CALL is very powerful because it combines the role of parameter- 
passing with machine code execution. Before delving into great detail, it is 
useful to study the following sequence of events starting with the source 
code and ending with the final execution: 


1. A source code listing 
10 PAZ=20D01 
20 
=0 LDA #99 
40 STA SODFF: 
a0 J 
60 PRINT:PRINT"The contents of SODFF 
is "5; ?%ODFF 


Line 10 positions the code in page &0D, a quite arbitrary decision. Lines 30 
and 40 are the assembly code which loads the accumulator with decimal 99 
then stores it in &ODFF. Line 60 is BASIC and prints out the contents of 
this address, using the byte indirection operator. 


2. Running the program 


When we type RUN, the result is: 
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ODOO 
OD0O0 AD 63 LDA #99 
ODO2 8D FF OD STA &ODFF 


The contents of address &ODFF is 13. 


This is the work of the assembler and indeed is called an assembly listing. It 
has produced the correct machine code from our mnemonic source code. 
The first column is the hex address of the first byte on that line. The second 
column is the hex machine code consisting of the op-code, followed by the 
operand. Notice that the two-byte operand is low-byte, high-byte form and 
appears back to front. The third column is our original source code. 

Note carefully that we have RUN but the machine code doesn’t appear to 
have worked because the printout is 13 instead of the expected 99. This is 
because the machine code has only been assembled — that is to say, it has 
only been converted from our source code to a pure machine code form. 
But we have not yet told the code to be executed! Thus, the ‘answer’ we got 
of 13 was merely garbage that happened to be in &ODFF. If you try it, you 
will have a different garbage number (unless the law of averages break 
down). So, we need another step. 


3. Executing the machine code with CALL 
CALL &0D01 


Nothing visible happens but the machine code has been executed. To prove 
it, RUN the program again. This will produce exactly the same results as 
the previous RUN but with one important difference. The BASIC line at 
the bottom will now read: 


The contents of address &0D01 is 99 


The steps shown above have deliberately been separated in order to 
emphasise the difference between the assembly and the call processes. In 
practice, the CALL would normally be included under a line number in the 
program rather than activated by a separate command. For example, the 
listing above could be written: 


10 PZ=R0D01 

2O [ 

=0 LDA #99 

40 STA SODFF 

45 RTS ~ 

a0 J 

oo CALL ODO! 

60 PRINT:PRINT"The contents of SODFF 
is "53 ?SODFF 
- FO END 
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Apart from the extra CALL line, notice that RTS has been squeezed in. 
This should be considered the normal way to terminate machine code 
routines in order to ensure smooth control back io BASIC after the code 
has been executed. Without RTS, an error message from the assembler 
might (probably will) appear. Now when we type RUN, the program will 
automatically assemble the code, line 55 will execute it and RTS returns 
control back to BASIC at line 60. | 


Controlling the assembler output 
The pseudo operation OPT controls the activities of the assembler. It is 
fully described in the User Guide but is repeated (with less detail) here. 


The format is OPT pass, where pass is a variable which can be 0, 1, 2 or 3. 


If pass=0, the assembly listing is suppressed and no errors are reported. 
If pass=1, only assembly errors are suppressed. 

If pass=2, only assembly listing is suppressed. 

If pass=3, nothing is suppressed 


It is important to remember that OPT is not a BASIC keyword, 
consequently 1: is only legal within the square bracket area. If OPT is not 
used, it defaults to OPT 3. Since the previous examples have not used OPT, 
the assembly listing appeared and assembly errors might also have 
appeared. It is often inconvenient and purposeless to display these except 
during the development and debugging stage. 


The forward branch problem and 2-pass assembly 

If a branch instruction directs control backwards to a labelled line, the 
assembler can cope immediately because it has picked up the address of the 
label on the way. However, if the branch directs to a forward address, the 
assembler is confused. To see why, imagine what happens in the following 
piece of code: 


110 BNE exit 
120 

130 

140. exit 


The main job of the assembler is to change mnemonic op-codes and 
operands into equivalent hex numbers and addresses. So what happens 
when it reaches line 1 10? It can easily look up its conversion table to find the 
hex code for BNE (it is DO). But it can’t determine the hex address of the 
operand because it hasn’t yet reached line 140. Its intelligence is just not 
equal to the situation, so it gives up. 

The BBC assembler is not peculiar in this respect. It is a common 
problem in all but the most sophisticated versions. The standard way out is 
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to give the assembler two goes at it, a trick known as two-pass assembly. 
The first pass collects all the labels and addresses and the second pass 
uses them to produce the final assembly. 

It would be boring and error-prone if the operator always had to 
assemble programs twice. Fortunately, the FOR/NEXT loop in BASIC 
can be left to do the donkey-work. The following piece of code includes a 
simple forward branch and illustrates the two-pass assembly technique: 


100 FOR pass=0 TO 3 STEP 3 
110 PZ=R%0D00 
120 C 

i130 OPT pass 
140 LDA #3 

iS50 CMP #3 

160 BEQ Finish 
170 NOP 

180 .Finish RTS 
190 Jj 

200 NEXT pass 
210 CALL &ODOO 
220 END 


The code itself is purposeless and barely worth explaining. It is easy to 
see that the forward branch, (the object of the example), will always take 
place. Line 170 could have been any useless instruction so NOP is 
exceptionally well-qualified. Note the following points: 


(1) The FOR loop ensures that OPT 0 applies during the first pass. We 
don’t want a listing and we certainly don’t want the inevitable error 
‘unknown label’ to appear. 

2. During the second pass, OPT 3 applies so the assembly listing appears 
and any errors (there shouldn’t be any now) are reported. 

3. The assigning of P% is inside the FOR loop. This ensures that the second 
pass starts again on the same piece of code. If P% was assigned before the 
FOR statement, the second pass would try and assemble code following on 
from the first pass with unpleasant results. 

(4) The NEXT statement which closes the FOR loop must be outside the 
square brackets. 

(5) If assembly code is intermixed with BASIC lines, it is necessary to 
include an appropriate OPT each time, otherwise it would default to 
OPT 3. . 


Using variable names 

The facility to predefine memory locations with BASIC variables and use 
them in assembly code should be exploited to the full. Any dodge which 
makes assembly code look less formidable is worthwhile, even if it does 
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squander a few BASIC lines. For example: 


Number 1=&70 
Number 2=&7] 


Having defined these locations, the variable names can be used in 
assembly operands, rather than absolute hex addresses. The following 
example, which adds two numbers (limited to single byte), illustrates some 
of the interchanges possible between BASIC and assembler: 


i0 DIM STARTZ 40 

20 CLS 

30 INPUT"Enter first number “Number i 
40 INPUT"Enter second number “Number2 
390 Result=&70 

60 PZ=STARTZ 

7O 

0 ee 

90 LDA #Number 1 
i100 ADC #Number2 
110 STA Result 
120 RTS 
toe 3 
140 CALL STARTZ 
i50 PRINT ?Result 


Just for a change, the program is stored by courtesy of DIM, instead of 
using fixed free space. It is easy to get mixed up with the address of data and 
the data itself. For example, it may not be obvious why lines 90 and 100 
must have the hash mark denoting immediate addressing. Note also that 
line 150 prints the contents of ‘Result’. If ‘Result’ was printed, it would be 
an address rather than data in that address. 


Macros 


A macro is short for macro-instruction, a facility offered in some 
assemblers, whereby a block of instructions performing some task can be 
defined by name, and later treated as if it were a single instruction. 
Superficially, this description resembles an ordinary subroutine so it 1s 
important to compare them with a view to spotting the differences. 

A subroutine is written once, stored in some fixed location and called up 
by a special jump (JSR). A macro, on the other hand, is assembled in 
machine code each time it is used in the body of the program. For example, 
suppose the following is a macro in a fictitious (and using an equally 
fictitious format) machine: 
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Macro TDX. 
DEX 
DEX 
DEX 
End Macro 


The macro is first defined and given some arbitrary name (TDX in 
example). The macro is then written (three consecutive decrements in the 
example). The macro is then terminated. From now on, anytime we use 
TDX, the assembler will include the three instructions in the coding. Note 
that there is no ‘jump’ action. The macro is inserted in line every time. 
Because there is no time wasted in calling and returning from a subroutine, 
a macro has a higher execution speed. 


SLiISt 
10 REM Frocedures used as macros 
20 DIM STARTZ 40 
320 PZ=STARTZ 
40 (:LDA #1i:] 
20 PROCshiftlefts 
60 (:S5TA &70:] 
7O PROCsShHiftleft 3 
80 ([:STA &71 
90 RTS 
i100 Jj 
110 CALL STARTZ 
120 PRINT?&70 
130 PRINT?&/71 
140 END 
150 DEF PROCshiftlefts 
160 [C:ASL A:ASL A:ASL Az] 
170 ENDPROC 


>RUN 
19EF 
i9EF AS Ol LDA #1 
19F 1 
19Fi OA ASL A 
i9F2 OA ASL A 
19F= OA ASL A 
19F4 
1i9F4 85 70 STA &70 
19F6 
19F4 OA ASL A 
19F7 OA ASL A 
19F8 OA ASL A 
19F9 
19F9 85 71 STA &71 
19FBR 40 RTS 
g 
64 


Program 4.7 .Procedures used as macros. 
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Now for the crunch. There is no macro facility in the BBC machine, at 
least not directly. However, it is possible to wangle it by exploiting BBC 
BASIC’s most powerful asset, the Procedure. Instead of naming a macro, 
we name a procedure. Defining the macro is replaced by DEF PROCname. 
Naturally, we must define the procedure in BASIC and use it in BASIC, 
but the assembler is undaunted providing the square brackets are used 
correctly. | 

To illustrate, Program 4.1 loads the accumulator with | and then uses a 
simple ‘macro’ to produce three shift lefts on the accumulator. The macro is 
used twice, so the accumulator is shifted 6 times. After RUN, the assembly 
listing appears. You will notice that the three ASL instructions are indeed 
assembled in-line each time the ‘macro’ is used. This example should 
emphasise the difference between macros and subroutines. We have stated 
that a macro is faster and yet, from the assembly listing it is not too obvious 
why. Actually, the time it takes actually to assemble macros may indeed be 
longer than assembling the equivalent subroutine. Once assembled (which 
only has to be done once) however, the execution is faster with macros 
because no time is wasted in jumping and returning (JSR takes 6 clock 
cycles and so does RTS). Against this, however, it should be realised that 
macros use up memory for the assembly code each time they are called. 

The main hazard, when using procedures to simulate macros, is failing to 
observe the rules of the square brackets when dodging in and out of 
BASIC. The brackets can be round the wrong way or in the wrong places. 
Program 4.1 above uses a standardised format with the brackets enclosing 
the procedure call and on the same line. 

As with normal procedure calls and definitions, it is possible to pass 
formal parameters into the assembly enclosure. For example, we can define 
a procedure as follows: 

900 DEF PROCsubtract (number 1, number 2) 
2710 

4720 SEC 

330 LDA number i 

240 SBC number 2 

300 J 

260 ENDPROC 

Once such a procedure is defined, it can be used to subtract, say, contents 
of Tax from Gross by using: 


PROCsubtract (Gross, Tax) 
Naturally, such a simple procedure can only handlea single byte subtraction 
but this is irrelevant — it is the principle that matters. 


Conditional assembly 

This technique, like macros, is commonplace in professional mainframe 
assemblers. It simply means that certain changes can be made in assembly 
code without having to reassemble each time. Again like macros, it is not 
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directly available on the BBC machine but can be simulated by a mixture of 
BASIC and machine code. Asa simpleexample, it may be that during program 
development we would like to try the effect of X=4 or X=40 in the same 
program. A simple IF/ THEN/ELSE spliced in the right place would do 
the job nicely: 


IF Speed=slow THEN [ : LDX #4: JELSE[ : LDX #40: ] 
Passing parameters via registers 


There are various ways of passing parameters from the main program 
(which could be BASIC) to a machine code subroutine. The simplest, but 
not always the most advisable, is via registers. The three most important 
resident subroutines, OSRDCH, OSWRCH and OSBYTE (discussed in a 
later chapter) all use registers for parameter passing. OSRDCH and 
OSWRCH use the accumulator; OSBYTE uses the accumulator and the X 
and Y index registers. 

Apart from the enforced use in resident subroutines, registers are not the 
ideal medium for passing parameters. The 6502 is not generous as far as 
they will already be holding variable data — although, of course, the stack 
can be used as a temporary store while the registers are being used. 
Fortunately, alternatives are available in the shape of the USR 
and CALL statements. 


Passing parameters with CALL 


The CALL statement is far more powerful than has been indicated earlier 
in the chapter. For example, we can use: 


CALL NAME, G%, A$, Blogs 


This shows that, in addition to actually calling (executing) machine code, it 
is possible to pass a variety of mixed parameters (integer variables, string 
variables and floating point variables and even single byte numbers) to the 
assembly code. Essential information on the parameters is passed to a 
special memory block located in the BBC machine at &0600 onwards. This 
block contains the following information: 





& 0600 number of parameters passed in CALL statement 


&060 | low byte address of first parameter 
&0602 high byte address of first parameter 
&0603 code for parameter type 


&0604 low byte address of second parameter 
& 0605 high byte address of second parameter 
&0606 code for parameter type 
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&0607 low byte address of third parameter 
&0608 high byte address of third parameter 
&0609 code for parameter type 


This sequence is repeated for any further parameters. 
The parameter code which has been referred to above is as follows: 


0 8-bit byte (example ?Z) 

4 32-bit integer variable (example Volts%) 

5 5 byte floating-point number (example Blogs) 
81 string variable (example Name$) 

80 string at a defined address (example $Name) 


The reference above to simple variables, also applies to array variables: For 
example, A%(3) . C(3) or B§(5). 

The parameter block format always begins with the number of variables 
attached to the CALL. Thereafter, three bytes of information are given for 
each variable. If, for example, there are five variables in all, there will be 1+ 
(3X5 16 bytes of information, starting at &0600 and ending at &060F. It 
is worth stressing that the information given is not the data itself but the 
address to which the data has been transferred. These addresses are not 
constant and only the operating system will be aware of them. When sucha 
situation arises (in which only addresses are given) the data can be easily 
obtained by the use of indirect addressing. All that needs to be done is to 
treat the address information as pointers which can be passed to page-zero 
for use by indirect address action. It is possible to avoid indirect addressing 
to obtain this data by a series of LDAs and STAs but it is messy and 
inefficient. 

It is easy to be confused by all this so it is essential to attack the subject in 
gentle steps. We begin by writing a few lines of code just to test that the 
CALL statement is indeed operating as described above. This code is shown 
in Program 4.2. 

To keep the first example simple, only one parameter is passed in line 20. 
It is spread out in hex in order to appreciate the result more readily. Since 
we are demonstrating CALL, there must be some bit of machine code to 
call, so to maintain simplicity it is sufficient to use a NOP (we are not, at 
this stage, interested in the particular code). Line 90 is in BASIC and calls 
up the code, passing G% to the system. Lines 110 to 130 print out (in hex) 
the contents of the parameter block. On first running the program, it stops 
at line 140 and displays the assembly code and the contents of the 
parameter block which have the following significance: 


l number of parameters passed (just G%) 

IC low-byte address of where the first byte of G% is stored 
4 high-byte address 

4 parameter code (4—four-byte integer) 
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>LIST 
10 REM Passing variables via CALL 
20 64=%01234567 
30 PZ=&0D00 
40 START=P%2 
vO C 
60 NOP 
70 RTS 
BO J] 
90 CALL START,GZ 
100 PRINT 
110 FOR Block=&0600 TO &0603 
120 PRINT*Y?Block 
130 NEXT 
140 END 
150 Pointer=041C 
160 FOR Data=Pointer TO Pointer+s3 
170 PRINT ™~Data,™~?Data 


180 NEXT 
>RUN 
oDOoO 
ODOO EA NOP 
ODO1 60 RTS 
1 
1c 
4 
4 
>GOTO150 
41C &7 
41D 45 
41E 23 
41F 1 


Program 4.2. Passing variables via the CALL statement. 


Thus, we are now in the position of knowing that our data has been 
stored in address &041C. You may see from this why it was necessary to 
stop the program at line 140. The preliminary RUN gave us the address 
information. A GOTO 150 then executes the bottom program which 
displays the contents of the four addresses &041C to &041F. The original G% has 
therefore been successfully recovered although in the conventional reverse 
order (low-byte first). It would have been possible to use the indirection 
operator (!) instead of the FOR/ NEXT loops but the display would have 
packed horizontally instead of being in more readable, vertical separation 
steps. 

The next program (Program 4.3) goes a step further by showing indirect 
address pointers picking up parameter data from the call statement. The 
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A-LIST 

10 REM Passing variables via CALL (2 
) 

20 POINTER=&70 

30 RESULT=%80 

40 START=&0COO 

30 PZ=START 

60 

7O LDA &0601 \Store LB/HB 

80 STA POINTER \Address of volts 

90 ~=LDA &0602 \in zero page 

100 STA,POINTER+1 

110 Life) #0 

i120 LDA (POINTER),Y \indirect address 

i130) 6STA RESULT 

140 RTS 

i) 

160 INPUT"ENTER INTEGER "VoltsZ 

170 CALL START, VolitsZz 

180 PRINT"THE INTEGER PASSED WAS "RE 
SULT 


Program 4.3. Passing variables and use of indirect address pointers. 


idea is to pass the address given in the parameter block at &0600, 0601 to 
the page zero addresses &70, &71 where they will act as the address pointer. 
The program is best understood by starting at the three lines of BASIC at 
the bottom (160 to 180 inclusive). The objective, again deliberately kept 
simple, is to input an integer variable (Volts%), pass it through the CALL 
procedure and print it out again, merely to prove the points described 
above. At the top of the program three preliminary assignments are made 
but you should particularly note line 20. It is a convention that the label, 
assigned to the lowest byte, is used to refer to the whole number, 
irrespective of the number of bytes. Lines 70 to 100 transfer the address 
information (where Volts% is stored) into the two zero-page addresses. 
Notice the economy of using POINTER and POINTER+ | for the low- and 
high-byte respectively. This is why only the low-byte pointer was assigned in line 
20. This little dodge is useful and willappear frequently in subsequent examples. 

Line 120 is the most important of all because it illustrates the beauty of 
indirect addressing. Index register Y is first cleared because it has no 
meaning in this context. All we want is simple indirect addressing without 
indexing. Although indirect indexed addressing is used, we could have 
substituted indexed indirect addressing - providing that the X register, 
instead of the Y register, was cleared initially. 

Proceeding another step further, Program 4.4 adds two single-byte 
numbers, both of them passed via the CALL statement. BASIC is used to 
input the two numbers into A% and B% and then passed to the system 
by means of CALL ADD, A%, B%. Since there are two variables, we 
reserve space for the address pointers in page-zero. The two assignments 
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>LIST 

10 REM 88-bit integer addition 

20 MODE 4 

30 FIRST=&70: SECOND=&72 

40 RESULT=%&80 

30 ADD=&0CO0O0 

60 FOR PASS=0 TO 3 STEP 3 

70 PZ=ADD 

80 CLOPT PASS 

90 LDA &0601 

100 STA FIRST 

110 LDA &0602 

120 STA FIRST+1 

130 LDA &0604 

140 STA SECOND 

150 LDA &0605 

160 STA SECOND+1 

170 LDY #0 

180 CLC 

190 LDA (FIRST),Y \Add pos integers 
200 ADC (SECOND),Y \indirect address 
210 STA RESULT,Y \Indexed address 
220 RTS: 1] 
230 NEXT 
240 PRINT 
250 PRINT"Addition of two unsigned int 

egers ":PRINT 

260 INPUT"First unsigned integer ",AZ 
270 INPUT"Second unsigned integer ",BZz 
280 CALL ADD,AZ,BZ 
290 PRINT"Addition= "5; 7RESULT 


Program 4.4. Single-byte addition. 


for this (appropriately labelled ‘FIRST’ and ‘SECOND’) are made in line 
30. Space is also reserved for RESULT in line 40. Lines 90 to 120 transfer 
the address information of A% in &0601, &0602 to &70 and &71. Lines 
130 to 160 perform a similar task for B%. After clearing Y and the carry 
bit, lines 190 to 210 perform the addition, again using indirect addressing. 

Although unnecessary (there are no forward branches in the assembly 
code), the two pass assembly process has been included for the first time. It 
is a good habit to cultivate, even when unnecessary, in order to maintain 
consistency in program layout. 


Passing variables by means of USR 
Although CALL is ideal for passing parameters to the assembler, it lacks 


the facility for directly returning parameters back from the assembler. The 
USR_-function is sometimes a more convenient, although less versatile, 
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option. The general format of the USR function is as follows: 


Result=USR (calling address) 
Examples: D%Z=USR (START), Blogs= USR(&0D00) 


Unlike CALL, the parameters to be passed must first be assigned to the 
four resident integer variables. A%,X%, Y% and C%. When USR is used, the 
information in these variables is transferred to the microprocessor registers 
of the same name with C% going into the Carry flag. The transfer is subject 
to the following provisos: 


(1) Only the low order bytes of A%,X% and Y% can be passed to A,X and Y 


registers. | 
(2) Only the least significant bit of C% can be passed to the C flag in the 


processor status register. 


>LIST 
10 REM Sbit INTEGER ADDITION 


20 REM DEMONSTRATING USR 
30 MODE4 
40 STORE=&70 
20 ADD=&0C00 
60 FOR PASS=0 TO 3 STEP 3 
70 PZ=ADD 
80 COPT PASS 
90 STX STORE \STORE X REG 
100 LDX #0 \SET HIGH BYTE TO O 
110 CLC 
120 ADC STORE 
130 BCC FINISH \NINCREMENT RESULT 
140 INX \NHIGH BYTE IF CARRY 
150 .FINISH \IS SET 
160 RTS: ] 
170 NEXT PASS 
180 PRINT 
190 PRINT"ADDITION OF TWO UNSIGNED 8bi 
t INTEGERS" 
200 PRINT 
210 INPUT"FIRST UNSIGNED 8bit INTEGER 
"AZ 
220 INPUT"SECOND UNSIGNED 8bit INTEGER 
" 3 ke 
230 RESULT=USR (&0COO) 
240 RESULT=RESULT AND &O0O0O0OFFFF 
250 REM MASK OUT UNWANTED BITS 
260 PRINT"ADDITION= "3; RESULT 


Program 4.5. 8-bit integer addition called by USR. 


After exit from the assembly code, a four-byte integer is returned to the 
result-variable supplied by the user. This integer is the composite contents 
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of the four registers P, Y, X, A in that order. For example, suppose we 
used: 


Blogs=USR (START) 


Suppose also that the assembly code, on exit, left the registers with the 
following contents: 


A=&FC, X=&67, Y=&FF and P (the process status register ) =&03 


On exit, the variable Blogs would contain &03FF67FC which, you will 
do well to note, is in the reverse order. The fact that the contents of the P 
register form the most significant byte of the result is, in some respects, 
unfortunate. This register is normally dedicated to flag bits and some 
degree of fiddling is necessary if it is ever called upon (indirectly) to 
contribute meaningful numerical information to the result-parameter. 

Program 4.5 illustrates the use of USR by adding two single-byte 
numbers in A% and X% respectively. The two numbers entered from the 
keyboard are placed in A% and X% by the BASIC lines 210, 220 before 
calling the code with USR. When two single-byte numbers are added, the 
result may spill over to two bytes but never more. Therefore the two higher 
order bytes of the four-byte result are so much garbage. Line 240 erases 
these bytes by use of the AND mask. There were no difficulties with the P 
register because it held one of the garbage bytes. However, many programs 
would require data contribution from P. There are only two instructions, 
PHP and PLP, which have direct action on the total contents of P. These 
can only push and pull to and from the stack. If we assume that the data to 
be placed in P initially rests in A, the following two lines of code illustrate a 
simple way out: 


PHA \Push A to stack 
PLP \pull A to P 


Although the method looks simple, there is a hidden danger. It can turn 
out to be a hazardous business to interfere with the processor status, 
particularly if the program allows interrupts. The original status, however, 
is restored by an RTS but care may be needed to preserve P before using the 
above. 


USR or CALL? 


The respective merits of USR and CALL can be summarised as follows: 


CALL: 

(a) CALL can pass any number of parameters, limited only by the available 
space from &0600 onwards. 

(b) The parameters are not restricted to integers. They can be any of thedata 
forms; including string array variables. 
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(c) The magnitude of the variables passed is not restricted to single bytes. 
(d) There is no provision in the CALL format for directly passing a result 
parameter (if there is one) back to the calling program. It must be arranged 
within the coding. 


USR: 

(a) Only three single-byte integers and one isolated bit can be passed as 
parameters to the subroutine. 

(b) A four-byte result can be directly returned. 


indirection operators 


The indirection operators form a grey area between high level language and 
machine code.They are similar to PEEK and POKE operations in 
traditional BASIC but are more versatile. However, this versatility is 
obtained at the expense of user-friendliness. The symbolism and format 
used can feel awkward for users hooked on BASIC. As far as machine code 
is concerned, the indirection operators are a boon because of the ease with 
which byte data can be pushed around memory. The definitions and format 
are well described in the user guides so only a brief outline (for the sake of 
continuity) is justified. 

There, are three operators and all may be taken to mean ‘the contents of 
.... For example, ?&0D00, means ‘the contents of address &0D00’ .A 
‘word’ is taken to mean four bytes at consecutive addresses. 


Byte indirection (7?) 
Word indirection (!) 
String indirection ($) 


All three operators must be placed before the address to which they refer. 
Some examples follow: 


Byte indirection 

(a) ?& X=&23 (b) ?&0DO0=5S (c) ?current=&456 (d) ?208=46 (e) PRINT 
2X (f) PRINT ~&0C00 

Note in (c) above that only the lower order byte (56) is assigned to ‘current’; 
the 4 is dropped because there is no room for it in a single byte. 


Word indirection 

One example is sufficient: 

!&0 DOO= & 23456789 

89 goes into &0D00,67 into &0D01, 45 in &0D02 and 23 in &0D03. 


String indirection 
$&0D00=“ABC” is an example worth examining in detail. The ASCII 
code for A (65) will be poked into &0D00, ASCII for B (66) in &0D0O1 and 
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ASCII for Cin &0D02. Note that the dollar sign comes first to distinguish 
it from a normal string variable. 


Saving and loading machine code 


If the machine code is written within a BASIC program enclosed in the 
usual square brackets, it can be saved or loaded in the normal way 
(SAVE“name” or LOAD“name”). However, once the machine code has 
been assembled (and you know the address of the first byte), it can be saved 
and loaded separately from the BASIC parts. The formats are as follows: 


To save machine code: 
*SAVE“name” start-address end-address+ | (addresses will be assumed hex). 
For example, if a machine code subroutine called “sort” is located between 
&0D00 and &0D20 inclusive, it can be saved by: 


*SAVE“sort” 0D00 0D21 
There is an alternative format: 

*SAVE“sort” starting-address number-of-bytes 
The above example would then read: 

*SAVE“sort” 0D00 +21 


To load machine code: 
The format is: 


*LOAD*“name” 


The code will be loaded into the same address band as when saved. An 
alternative format is: 


*T OAD“name” first-address 


This is used (rarely) if, for some reason, it is required to load the code intoa 
hex address, different to that used when the code was saved. 


To run machine code: 

The majority of machine code is likely to be called from within an outline 
BASIC program. If, however, the code is self-supporting it can be run by 
using: 


*RUN “name” 


Summary 
1. Assembly code within BASIC must be enclosed within [ ]. 
2. Hex op-codes are replaced by three-letter mnemonic groups. 


I]. 


12. 


3. 
14. 


1S, 


16. 
17. 


18. 
19. 


20. 


21. 


Zz. 


23. 
24. 


25. 


26. 


zi. 
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. All numerics are assumed decimal unless preceded by &. 
. Operands can be absolute numeric or any legitimate BASIC 


variable expression but not BASIC commands. 


. Registers can only hold one byte so any excess high order operand 


digits are dropped. 


. BASIC variables cannot be assigned within assembly code. 
. Conditional branch destination can be to a symbolic label. 
. A branch destination label must be preceded by the full-stop. There 


must be no full-stop when the label is in the operand position. 


_ Each statement on the same line must have the full colon as a delimiter. 
. P% is special. The contents will be the address of the first byte of the 


assembly code. 

Assembly code can be located by either absolute addressing, by 
modified DIM statement, below BASIC by using PAGE or between 
the end of BASIC program and the start of the ‘variable space’ by use 
of LOMEM. 

The BASIC keyword RUN assembles any machine code present but 
only CALL can execute it. 

CALL can be a statement within BASIC or a separate command. 
OPT is a pseudo operation (not a BASIC keyword) so is valid only 
within the square brackets. 

OPT defaults to OPT 3 which instructs the assembler to produce a 
listing and to report any errors. 

A fully debugged, tested program would normally be run under OPT 0. 
Assembly code which contains a branch to a forward (higher) address 
requires two passes through the assembler. 

A P% assignment must be inside the FOR loop for two-pass assembly. 
Macros, although not directly available in the assembler, can be 
implemented by using procedures. 

The square brackets, denoting assembly code are legal within DEF 
PROC. 

Macros, implemented by procedures, require a temporary exit from 
the assembler. 

Conditional assembly is not directly available but can be implemented 
by the IF THEN ELSE structure. 

Parameters can be passed to assembly subroutines by CALL or USR. 
In the BBC machine, the address band beginning at &0600 is the 
parameter-block used by CALL. 

The parameter-block contains only the address of the parameters not 
the parameters themselves. 

The parameter begins with the number of parameters. Three items for 
each parameter then follow. 

The first and second items give the low- and high-byte address and the 
third gives the parameter type. 
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28. 


29. 


30. 


31. 


32. 


Ios 


34. 


Ja. 
36. 


37. 


38. 


39. 


The parameter type code is 0 for single byte, 4 for four-byte integer, 5 
for five-byte floating point variable. 

The parameter type code 80 is for defined address strings and 81 is for 
normal string variables. 

Because the CALL parameter block contains only addresses of data, 
the data is easily recovered by using indirect addressing. 

Before using indirect addressing, Y (or X) are usually zeroed and 
parameter addresses transferred to page zero. 

Parameters passed by USR are by A%, X%, Y% into the same named 
6502 registers. The lsb of C% is passed to the carry bit in P (process 
status register). 

The result of USR action (if any) is passed to any designated integer 
variable. 

The USR result is the combined contents of P, Y, Xand Aas they were 
on exit from the subroutine. 

The most significant byte of the USR result is that contributed by P. 
Indirection operators ?, !, $ before a variable refer to single-byte, four- 
byte and string respectively. 

The presence of * before SAVE or LOAD indicates that the actions 
refer to assembly code programs. 

Provision exists within * LOAD for loading assembly programs into a 
different address block from that used when *SAVE was used. 
When using *SAVE, either the last address+ 1 of the block is stated or 
the number of bytes. 


Self Test 


4.1 


4.2 
4.3 
4.4 


4.5 


4.6 


4.7 


What is the error in the following segment of code? 
LDA Number:. Label CLC:ADC #&20:BNE .Label 


RUN, CALL and OPT. Which of these are pseudo-operations? 
What is the default number in OPT n? 

If the only branch instruction in a code section was BNE &83, would it 
require two-pass assembly? 

Some code is called with CALL name F%, Blogs. State: 


(a) The hex number stored in address &0600. 
(b) The contents of address &0603. 
(c) The contents of address &0606. 


The result variable, returned by a USR call, was &12345678. 


(a) What number was in the X register? 
(b) What number was in the process status register? 


Using the word indirection operator, write a BASIC statement which 
will print out in hex the contents of address 34587. 


Chapter Five 
Multi-byte Loops 





Two-byte operations 


Single-byte working is ideal for illustrating the basic principles of the 6502 
or, indeed any other 8-bit microprocessor. However, machine code 
programs of practical value must assume that numbers will greatly exceed 
the capacity of a single byte. Multi-byte (or multi-precision) working is the 
software solution. In other words, an 8-bit microprocessor can by using 
suitable software, simulate a microprocessor of (theoretically) any desirable 
word length. There are penalties, of course, the most important being 
increased execution time and extra programming involved in arranging the 
component bytes. The programs in this chapter are kept simple since they 
are only intended as guidance on the formation of loops involving rev 
counts greater than 255. 


Incrementing a two-byte number 

Incrementing the loop counter (in cases where the number of revs round the 
loop exceeds 256) poses problems associated with two-byte numbers. The 
following segment of code is a simple solution: 


INC NUMBER 
BNE SKIP 

INC NUMBER+1 
. SKIP 


NUMBER isthe low order byte of the loop counterand NUMBER +1 the high 
order byte. While the count remains less than 255, only the low order byte is 
incremented because of the branch to SKIP 


Decrementing a two-byte number 
The following is as economical (in execution time) as any: 


SEC 

LDA NUMBER 
SBC #1 

STA NUMBER 
BCS SKIP 
DEC NUMBER+1 
«SKIP 
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Note that SBC is used for decrementing the low-order byte instead of DEC. 
This is because: 


(a) DEC will not affect the carry flag. 
(b) The Z flag cannot be used because the high-byte is only decremented 
when the low-byte has passed through zero. 


Adding two single-byte numbers 

Even when the numbers are individually within the capacity ofa single byte,a 
double-byte result must be allowed for. The following segment allows for 
this: 


LDA #0 

STA SUM+1 
CLC 

LDA NUMBERI 
ADC NUMBER2 
STA SUM 

BCC SKIP 
INC SUM+1 

» SKIP 


Adding a single-byte number to a double-byte number 
The following short program illustrates how a single-byte number can be 
added to a double-byte number: 


CLC 

LDA NUMBER1 
ADC NUMBER2 
STA NUMBER1I 
BCC SKIP 

INC NUMBER+1 
» SKIP 


The example programs which follow will pass parameters by means of 
the CALL statement and, consequently, will take advantage of indirect 
indexed addressing. It would be possible, and perhaps simpler, to make use 
of the word-indirection operator. However, the advantages of indirect 
addressing, the concept of address pointers and the power of the CALL 
statement justify the extra programming work. This is a useful habit to 
acquire, since most machine code routines will ultimately be called from 
BASIC. We shall use the word-indirection operator only ina BASIC print 
role. 
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Four-byte operations 
Simple four-byte addition 


Integer variables in the BBC and Electron occupy four bytes. The flowchart 
shown in Fig. 5.1 illustrates the addition of two 32-bit integers. 


FROM BASIC 


GET ADDRESSES 
OF A% AND B% 


STORE FIRST 
STORE SECOND 
(ZERO PAGE) 





CLEAR Y INDEX 

CLEAR CARRY 

SET BYTE COUNT TO 
4 








ADD INTEGERS, A 
BYTE AT A TIME 
(INDIRECT INDEXED) 








INC Y 


DECREMENT BYTE 
COUNTER 


TO BASIC 


Fig. 5.7. 32-bit integer addition. 


The flowchart begins at the point where the two variables to be added 
(A% and B%) have been received from the CALL statement in BASIC with 
their addresses passed to the parameter block at &0600. These addresses 
now become the address pointers ‘FIRST’ and ‘SECOND’ which are 
transferred to zero-page locations. 

The four-byte loop is then initialised by: 
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(a) setting the Y index to 0 (ready for the low order byte in each integer), 
(b) setting the loop counter (X) to 4, 
(c) clearing the carry. 


Each time round the loop, the following actions occur: 


(a) The corresponding bytes of each integer are added using indirect 
indexing and taking the carry bit into consideration. 
(b) The byte sum is transferred to ‘RESULT’, this time using indexed 


addressing. 
(c) The Y index is incremented ready for action on the next higher order 


byte. 
(d) The loop counter (X) is decremented. 


The loop exits after the most significant byte pair has been added which is 


>LIST 
10 REM 32bit INTEGER ADDITION 
20 MODE4 
30 FIRST=&70: SECOND=&72 
40 RESULT=&%&80 
790 ADD=%&0C00 
60 FOR PASS=0 TO 2 STEP 2 


7O PZ=ADD 

80 COPT PASS 

90 LDA &0601 \STORE ADDRESSES 
100 STA FIRST \OF BASIC INTEGERS 
110 LDA &0602 \AZ AND BZ IN 

120 STA FIRST+1 \ZERO PAGE 


130 LDA &0604 

140 STA SECOND 

150 LDA &0605 

160 STA SECOND+1 

170 LDY #0 

180 LDX #4 \SET BYTE COUNTER 
190 CLC 

200 .ADDLOOP 

210 LDA (FIRST),Y \ADD INTEGERS 

220 ADC (SECOND),Y \A BYTE AT A TIME 
230 STA RESULT,Y \USING INDIRECT 


240 INY \NINDEXED ADDRESSING 
230 DEX \BRANCH ADDLOOP 

260 BNE ADDLOOP \IF BYTE CTR=0 

270 RTS: ] 

280 NEXT PASS 

290 CLS | 


300 INPUT"FIRST INTEGER ",A% 
310 INPUT"SECOND INTEGER ",B% 
320 CALL ADD, A%, Bx 

330 PRINT"ADDITION= "3 '&B80 


Program 5.1 32-bit integer addition. 
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when the loop count has reached zero. The control then passes back to 


BASIC. 

The complete assembly coding is givenin Program 5.1. It can be deduced 
from line 30 of Program 5.1 that the address pointers FIRST and 
FIRST+1 occupy &70 and &71. Also SECOND and SECOND+ | address 
pointers occupy &72 and &73. The RESULT, in &80 and &81 is the data 
itself, not an address pointer. This is confirmed by line 230 which shows 
that simple indexed (not indirect) addressing is used for RESULT. 


Simple four-byte subtraction 
Because of the close similarity with the previous program, a flowchart was 
not considered necessary, so only the listing is given in Program 5.2. 


>LIST 
10 REM S2bit INTEGER SUBTRACTION 
20 MODE4 
sO FIRST=&70: SECOND=&72 
40 RESULT=&80 
20 SUBTRACT=&0C00 
60 FOR PASS=0 TO 2 STEP Z 
7Q PZ=SUBTRACT 
80 COPT PASS 


70 LDA &0601 \STORE ADDRESSES 
100 STA FIRST \OF BASIC INTEGERS 
110 LDA &0602 \AZ AND BZ IN 
120 STA FIRST+1 \ZERO PAGE 


130 LDA %0604 

140 STA SECOND 

150 LDA &0605 

160 STA SECOND+1 

170 LDY #0 

180 LDX #4 \SET BYTE COUNTER 
190 SEC 

200 .ADDLOOP 

210 LDA (FIRST),Y \SUBTRACT INTEGERS 
220 SBC (SECOND),Y 

230 STA RESULT, Y \USING INDIRECT 


240 INY \INDEXED ADDRESSING 
2390 DEX \BRANCH ADDLOOP 

260 BNE ADDLOOP \IF BYTE CTR<>0 

270 RTS: } 

280 NEXT PASS 

290 CLS 


300 INPUT”"FIRST INTEGER ",A% 
310 INPUT"SECOND INTEGER ",B% 
320 CALL SUBTRACT, AZ, Bx 

330 PRINT"SUBTRACTION= "; '&80 


Program 5.2. 32-bit integer subtraction. 
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Multiple byte loop (up-counting) 
It is useful to have a skeleton program for performing a certain process n 
times where n is not limited to 256. Figure 5.2 shows the outline flowchart, 


SET UP NUMBER 
IN ZERO PAGE 
INITIALISE 
CYCLE TO ZERO 
iT I 
: | 
i | 


PROCESS 
i | 


INCREMENT 
CYCLE 
(2 BYTES) 


4 : 
NO 


EXIT 


Fig. 5.2. Flowchart for up-count. 


with the particular process left undefined. No attempt is made in the 
flowchart to discriminate between low-byte and high-byte components of 
CYCLE and NUMBER. To do so would entail extra detail which could 
weaken, rather than clarify, the impact of the flowchart. 

Program 5.3 is an implementation of the flowchart in Fig. 5.2 and will 
print the letter H on the screen 1024 times. 

NUMBER (in Program 5.3) is the number of times the process is to be 
completed. CYCLE is the current loop count. Line 30 assigns the two bytes 
of CYCLE to &70 and &71, and NUMBER to &72 and &73. Purely for 
purposes of illustration, NUMBER has been initialised to a constant value 
of 1024 by line 40. This is done by setting the low-byte of NUMBER to 0 
and the high-byte to 4 (equivalent to 4X 256). 

The process, used as an example (painting H on the screen), occupies 
lines 130 and 140 and uses the resident subroutine OSWRCH which is at 
address & FFEE. 


LIST 


Multi-byte Loops 


10-REM MULTIPLE BYTE LOOP (UPCOUNTING) 


220 
250 
240 
230 
260 
2709 


MODE4 


CYCLE=&70: NUMBER=&72 
PR72Z=0: PRTS=4 


START=&0CO00 


FOR PASS=0 TO 2 STEP 2 


PZ=START 
COPT PASS 
LDA #0 

STA CYCLE 
STA CYCLE+1 
. LOOP 

LDA #%48 
JSR &FFEE 
INC CYCLE 
BNE SKIP 
INC CYCLE+1 
. S9KIP 

LDA NUMBER 
CMP CYCLE 
BNE LOOP 


\NINITIALISE CYCLE 
\COUNTER TO ZERO 
\(2 BYTES) 


\PRINT A "H" ON THE 
\SCREEN. 

\INCREMENT THE CYCLE 
\COUNTER BY 1 

\(2 BYTES) 


\COMPARE NUMBER OF 
\CYCLES RE@D TO CYCLE 
\COUNTER IF NOT EQUAL 


LDA NUMBER+1\BRANCH TO LOOP 


CMP CYCLE+1 
BNE LOOP 
RTS: J 

NEXT PASS 
CALL START 


\(2 BYTES) 


Program 5.3. Multiple-byte loop (up-counting). 


Multiple-byte loop (down-counting) 
Providing the sole criterion is that a process is carried out the requisite 
number of times, it matters little whether the loop counter starts at zero and 
increments or starts with a finite number and decrements towards zero. 
However, as discussed ina previous chapter, the decrement method (down- 
counting), is both simpler and faster in execution. No comparison 
instructions appear and therefore there will be no need to assign 
NUMBER. Program 5.4 is identical in objective to the previous program 
but uses this down-counting method. 


-LIST 


10 REM MULTIPLE BYTE LOOP (DOWNCOUNTI 


NG) 
20 
30 
40 
30 
60 
7QO 


MODE4 
CYCLE=&70 


2?&70=03 7&71=4 


START=&0C00 


FOR PASS=0 TO 2 STEP 2 


PZ=START 


Program 5.4. Multiple-byte loop (down-counting). 


107 


B80 COPT PASS 
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90 LDA #%48 \PUT A "H" ON THE 
100 JSR &FFEE \SCREEN 

110 SEC 

120 LDA CYCLE \DECREMENT CYCLE 
130 SBC #1 \COUNTER BY 1 

140 STA CYCLE \(2 BYTES) 

150 BCS SKIP 

160 DEC CYCLE+i 

170 .SKIP 


180 “LDA CYCLE 
190 BNE START 
200 LDA CYCLE+1 
210 BNE START 
220 RTS: 1] 

230 NEGT PASS 
240 CALL START 


\COMPARE CYCLE COUNTER 
\TO ZERO, IF NOT EQUAL 
\BRANCH TO START 


Program 5.4. contd 


It is worth comparing the two programs side by side to dispel lingering 
doubts as to which is the more elegant. 


Adding an array of integers 


Program 5.5 adds four-byte integer numbers held in a BASIC array 


(ARRAY%). For testing purposes only, ARRAY% is filled with random 
eLIST 

10 REM S2bit INTEGER ARRAY SUMMATION 
20 MODE 4 

350 NUMBER=&70: POINTER=&72 

40 RESULT=&80 

JO SUM=%0C00 

60 FOR PASS=0 TO 2 STEP 2 

7Q PZ=SUM 

80 COPT PASS 

90 LDA &0601 \GET NUMBER OF 
100 STA RESULT \INTEGERS IN 

110 LDA &0602 \ARRAY 

120 STA RESULT+1 \STORE IN NUMBER 
130 LDY #0 

- 140 LDA (RESULT),Y 

150 STA NUMBER 

160 INY | 

170 LDA (RESULT),Y 

180 STA NUMBER+i1 

190 LDA &0604 \GET START 
200 STA POINTER \ADDRESS OF ARRAY 
210 LDA &0605 \STORE IN POINTER 


Program 5.5. Integer array summation. 
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220 STA POINTER+1 


250 LDA #0 \NINITIALISE 4 
240 STA RESULT \BYTES FOR RESULT 
290 STA RESULT+1 \TOQ ZERO 


260 STA RESULT+2 
270 STA RESULT+3 


280 .LOOP 

290 LDY #0 

300 LDX #4 \SET BYTE COUNTER 
310 CLC 

320 .ADDLOOP \NADD SUCESSIVE 
330 LDA (POINTER), Y \INTEGERS A BYTE 
340 ADC RESULT, Y \AT A TIME. STORE 
350 STA RESULT, Y \CUMUL’ VE RESULT 
360 INY 

370 DEX \DEC BYTE COUNTER 
380 BNE ADDLOOP 

390 CLC 

400 LDA POINTER \ADD 4 TO POINTER 
410 ADC #4 


420 STA POINTER 
430 BCC SKIP 
—440 INC POINTER+1 


450 .SKIP 

460 LDA NUMBER \DECREMENT 
470 SEC \NUMBER BY 1 
480 SBC #1 


490 STA NUMBER 
500 BCS SKIP2 
S510 DEC NUMBER+1 
520 .SKIP2 
S530 LDA NUMBER \IF NUMBER IS NOT 
540 BNE LOOP \ZERO THEN BRANCH 
550 LDA NUMBER+1 \TO LOOP‘(2 BYTES) 
560 BNE LOOP 
570 RTS:] 
580 NEXT 
590 CLS 
- 600 INPUT"HOW MANY RANDOM INTEGERS ",N 
UMBER% 
610 DIM ARRAYZ (NUMBERZ) 
620 FOR N%Z=1 TO NUMBER thes 
630 ARRAY% (N%) =RND/ 100000 Wp Lleoo vo )- 
640 PRINT ARRAY% (NZ) 
650 NEXT 
660 PRINT:PRINT 
670 CALL SUM, NUMBER%, ARRAYZ% (1) 
680 PRINT"SUM= "$s 'RESULT 
690 PRINT:PRINT 
700 PRINT"CHECK USING BASIC" 
710 PRINT 


Program 5.5. contd 
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720 SUM=0 

730 FOR NZ=1 TO NUMBERZ 
740 SUM=SUM+ARRAYZ (NZ) 

750 NEXT 

760 PRINT"CHECK= "5;SUM 


>RUN 


HOW MANY gn INTEGERS 75 


20966 
10485 
—-2851 
—-2610 


SUM= 26671 


CHECK USING BASIC 


CHECK= 26671 


Program 5.5. contd 


integers of mixed sign, the number of integers being entered by the user. An 
example computer RUN is shown at the end of the listing. It helps if the 
flowchart, shown in Fig. 5.3 is studied first. 

The program is the first one in this book which illustrates the speed of 
machine code. When assessing the speed, it should be realised that the 
filling of the array and the scrolled display of the numbers is carried out in 
BASIC. The speed referred to applies only to the machine code portion 
which performs the actual addition. A parallel addition check is carried out 
in BASIC, primarily for speed comparisons. To compare the machine code 
speed with the BASIC equivalent, run the program with 4000 integers 
instead of with 5 as shown in Program 5.5 and note that the machine code sum 
appears almost instantaneously after the numbers stop scrolling. The 
BASIC check on the addition takes many seconds. The program should be 
fairly easy to follow from the comments on the listing. It uses some of the 
coding blocks previously discussed. 
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START 


GET NUMBER OF INTEGERS 
IN BASIC ARRAY 
STORE IN NUMBER (2 BYTES) 


GET START ADDRESS OF BASIC 
INTEGER ARRAY 
STORE IN POINTER (2 BYTES) 


INITIALISE RESULT TO ZERO 
(4 BYTES) 


SET BYTE-COUNTER TO 4 


ADD SUCCESSIVE INTEGER BYTES 
STORE CUMULATIVE SUM IN RESULT 
(4 BYTES) 


DECREMENT BYTE COUNTER 
(4 BYTES) 


BYTE COUNT =0 
1 BYTE NO 


ADD 4 TO POINTER 
(2 BYTES) 
DECREMENT NUMBER BY 1 
(2 BYTES) 


NO 


RTS 


Fig. 5.3. Flowchart of Program 5.5. 
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Summary 


1. 


10. 


The 6502, in common with most microprocessors, has a word length of 
8-bits, limiting the magnitude of signed integers to +127 and —128 and 
unsigned integers to 255. 


. The 8-bit word length is merely a hardware limitation, easily overcome 


by means of software. 


. Separate 8-bit words can be considered ‘joined’ end to end in order to 


simulate long word lengths. The simulation is perfect in most respects 
except execution time. 


. Incrementing a double-byte number proceeds initially with the low- 


order byte. The high-order byte is incremented only when the count goes 
over the top from 255 (& FF) to zero (&00). 


. Decrementing a double-byte number is similar but SBC is preferable to 


DEC because a carry rather than a zero is required for the inner loop test. 


. Most machine code programs are entered from BASIC, so loop 


parameters can be easily passed by use of CALL. 


. Integer variables in BASIC occupy 4 bytes (32 bits). 
. In signed integer work, the sign bits in the three lower order bytes are 


ignored. Only the highest order byte carries real sign information. 


. When performing loop counts, it is normally more efficient to count 


down towards zero rather than up towards a finite number. 

When estimating the speed ofthe example programs, remember thatthe 
BASIC sections, which call and initialise the machine code, squander 
most of the execution time. 


Self Test 


5.1 A two-byte counter is holding 770 decimal. Write the bit pattern in the 


high-order byte. 


5.2 Atwo-byte counter is holding 1801 decimal. Write the bit pattern in the 


low-order byte. 


5.3 A four-byte counter is holding —1. What hex number is the highest 


order byte holding? 


5.4 Signed integers in the BBC machine occupy four bytes. What is the 


largest positive number possible (to the nearest million)? 


5.5 If two single-byte numbers of opposite sign are added, could the result 


ever exceed the capacity of a single byte. 


Chaper Six 
Sort Routines 





Apart from personal interest and/or intellectual stimulation, there is little 
point in adopting a partisan approach to machine code. It is pointless to 
view BASIC, particularly BBC BASIC, as a language inferior to machine 
code. The two should complement, rather than rival, each other. Once 
familiarity and confidence is gained in handling machine code, it will 
gradually become clear which parts of a BASIC program should be 
relegated to machine code and which parts can be handled quite adequately 
in BASIC. 

There can be no doubt, however, that one area in data processing which 
calls out for machine code solutions is sorting data into numerical or 
alphabetical order. It has been stated that approximately 30% of all 
commercial computing time is spent on some kind of sorting activity. An 
ordered system of any kind represents a ‘high energy’ system. Since the 
equation for energy in physics is power multiplied by time, we would 
therefore expect that programs which sort data will make heavy demands 
on computing time. esl : 

The physical power of a given computer is fixed by the hardware, whichin 
turn depends on such things as the clock frequency, word length and the 
sophistication built into the central processor (in the BBC, the central 
processor is the 6502). Although in no way meant as criticism, the machine, 
and indeed most other microcomputers likely to be found in the average 
home, are slow in terms of ‘mips’(million instructions per second). The 
BBC machine is rated at about 0.5 mips. In contrast, some of the modern 
mainframe giants have reached a speed approaching 100 mips with a word 
length of 64 bits and it is confidently expected that this figure will be 
substantially beaten by the forthcoming breed of fifth generation machines. 
Returning to present day reality, there is nothing we can do about the 
limitations imposed by the hardware of our machine. The only method of 
attack is to use software which takes the fullest advantage of the machine. 

This chapter is devoted entirely to the problem of machine code sorting 
routines. A simple bubble sort, limited to single byte data, was introduced 
in the previous book, Discovering BBC Micro Machine Code. The 
following programs treat the sorting problem in more detail and will 
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include the sorting of arrays of four-byte BASIC integers, strings, floating 
point numbers and two- dimensional string arrays. It will also include 
routines which sort fixed length multi-field records. They are given 
complete with BASIC parameter passing lines so they can be entered and 
exhaustively tested. It should be pointed out that the machine code portion 
of the listings will stand alone as subroutines, providing that: 


(a) the correct parameters are passed from any BASIC program; 

(b) the code is lodged either in one of the safe areas (not necessarily the areas 
used in our listings) or dynamically, above or below BASIC, by the use of the 
DIM statement. 


All the programs in this chapter are concerned with sorting data into 
either numerical or alphabetical order and will be useful in compiling 
indexes, customer lists, domestic accounts, hobby collections (butterflies, 
stamps, beetles) etc. They could also be employed as routines within 
general purpose filing systems to store or retrieve information according to 
some predetermined order. 


Bubble sort of a BASIC integer array 


The bubble sort is well-known but often despised because it is slow. It is one 
of the simplest sort routines to understand and, providing there are not too 
many elements in the array, can be acceptable if written in machine code. It 
provides a good starting point for handling multibyte integers. 

Because the programs which follow are intended to be used in 
conjunction with BASIC, via CALL parameters, it is important to 
understand how the interpreter allocates variable space. 


How integer array variables are stored 
The four bytes, allocated to each integer array variable, are aranged as 
follows: 
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START 


GET NUMBER OF INTEGERS IN BASIC ARRAY 
1 | FROM CALL PARAMETER BLOCK AND STORE IN 
NUMBER (2 BYTES) 


DECREMENT NUMBER BY 1(2 BYTES) 


3 SET FLAG TO ZERO (i BYTE) 
SET CYCLE TO ZERO (2 BYTES) 


nh 


OBTAIN START ADDRESS OF INTEGER ARRAY 
4 | FROM CALL PARAMETER BLOCK AND STORE IN 
POINTER 2 (2 BYTES) 


5 POINTER 2 BECOMES POINTER 1 (2 BYTES) 


ADD 4 TO POINTER 1 AND STORE IN POINTER 2 
(2 BYTES) 


a 


COMPARE INTEGERS 


YES 
IN ORDER 


SET SWAP FLAG 


™N 





SWAP INTEGERS A BYTE AT A TIME (4 BYTES) 


INCREMENT CYCLE (2 BYTES) 


co 
| 


NO CYCLE = NUMBER 
2 BYTES 


SWAP 
FLAG CLEAR 
(1 BYTE) 


11 DECREMENT NUMBER BY 1 (2 BYTES) 


10 











YES 


12 


UMBER =0 NO 
(2 BYTES) 


RETURN 


Fig. 6.7. Flowchart for integer array bubble sort. 
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Each integer of the array is then stored sequentially, so in effect, each 
element has an address 4 bytes in advance of the previous integer. 

The machine code routine is executed from BASIC via the CALL 
statement: 


CALL SORT, NUMBER%, ARRAY% (1) 


Of course, the above variable names are arbitrarily chosen but they must be 
in the above order where: 


SET BYTE 
COUNTER TO 4 
SET CARRY FLAG 







SUBTRACT ist 
INTEGER BYTEFROM 
2nd INTEGER BYTE 








DECREMENT 
BYTE COUNTER 





| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
errecnaney re | 
| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 


CARRY FLAG YES 
SET 






SET SWAP FLAG 





SWAP INTEGERS, 
A BYTE AT A TIME 
(4 BYTES) 







Fig. 6.2. Expansion of block 7 in Fig. 6.1. 
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SORT=the start address of the routine 
NUMBER%=the number of elements in the array 
ARRAY% (1) =the first usable element in the array 


The zero element is best left vacant for headings, etc. 

The flowchart for the routine is given in Fig. 6.1. As can be seen, the 
algorithm consists basically of an inner control loop and an outer control 
loop. The pairs of integers are repeatedly incremented, compared (and if 
necessary swopped) in the inner loop. The largest integer in the list always 
‘bubbles’ through to the last position. It will no longer be necessary to 
involve this integer again, so the outer loop count may be reduced by one. 
On the next inner loop series of cycles, the next largest integer ‘bubbles’ 
through to the last but one position in the list and so on, until the list is fully 
sorted. The maximum number of comparisons is approximately equal to 
half the square of the array size. The execution time can be reduced if the 
list is not too disordered by the use of a swop flag technique as shown in the 
flowchart. Note from the flowchart that the blocks have been numbered for 
reference purposes. Block 7 is shown expanded in Fig. 6.2. The listing 
corresponding to the flowchart is given in Program 6.1. 


LIS] REM BUBBLE SORT OF ARRAY OF 
20 REM 32bit UNSIGNED INTEGERS 
30 NUMBER=&70: CYCLE=&72: POINTER1=&74 
40 POINTER2=%&76: FLAG=&78 
SO DIM SORT 500 
60 FOR PASS=0 TO 2 STEP 2 
70 PY%=SORT 


80 COPRPT PASS Meret ee ee ee Se 


90 LDA &0601 \GET NUMBER OF 
100 STA CYCLE \BASIC INTEGERS 
110 LDA &0602 \IN ARRAY AND 
120 STA CYCLE+1 \STORE IN NUMBER 
130 LDY #1 
140 LDA (CYCLE),Y 
i50 STA NUMBER+1 
i160 DEY 
170 SEC 
180 LDA (CYCLE),Y \ EEE EEE HEE EE 
190 SBC #1 \DECREMENT 
200 STA NUMBER \NUMBER 
210 BCS OUTERLOOP 
220 DEC NUMBER+1 
2-0 .~OUTERLOOP \ 96 36 FE SE FE HE DE FE HE HE FE 4 4 
240 LDA #0 \INITIALISE 
240 STA FLAG \SWOP FLAG AND 
260 STA CYCLE \CYCLE TO ZERO 
270 STA CYCLE+1 
280 LDA &%0604 \, FE3E- FEE FE HE HE HE IE HEE 9 HE HE HE 
270 STA POINTER2 \STORE FIRST INT 


Program 6.7. 


Bubble sort of an unsigned 32-bit integer array. 
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S00 
310 
320 
330 
340 
300 
360 
370 
380 
390 
400 
410 
20 
430 
440 
450 
460 
470 
480 
490 
TOO 
310 
v20 
var ed © 
740 


uO 


360 


w/7O 
380 
370 
600 
610 
620 
620 
640 
6350 
6560 
670 
680 
6590 
7Q0 
710 
720 
730 
740 
7Fa0 
760 
7/9 
780 
790 
BOO 


LDA 20605 

STA POINTER2+1 
- INNERLOOP 
LDA POINTER2+1 
STA FOINTERI+1 
LDA POINTER2 
STA FOINTERI 
CLC 
ADC 
STA 


#4 

POINTER2 
BCC SKIP 

INC POINTER2+1 
- SKIP 

LDX #4 

LDY #0 

SEC 

. COMPLOOP 

LDA (POINTER?) ,Y 
SBC (POINTERI).Y 
INY 

DEX 

BNE COMPLOOP 
RCS NOSWOP 

DEY 

STY FLAG 

- SWOPLOOP 
LDA(POINTERI),Y 
TAX 
LDACPOINTER2) ,Y 
STACPOINTERI),Y 
TXA 
STA(POINTER2) ,¥ 
DEY 

BPL SWOPLOOP 

. NOSWOP 


INC CYCLE 


BNE SkKIF2 

INC CYCLE+1 
~SKIPZ 

LDA CYCLE 

CMP NUMBER 
BNE INNERLOOP 
LDA CYCLE+1 
CMP NUMBER+1 
BNE INNERLOOP 
LDA FLAG 

BEQ FLAGCLEAR 
LDA NUMBER 
SEC 
SBC 
STA 


#1 
NUMBER 


\NADDRESS TEMP IN 
\POINTER2 

\ ERE EEE ERE EEE 
\ TRANSFER 
\POINTERS 
\POINTERI = 
\PFOINTER2 

\ REE HEE EEE HHEE 
\ADD 4 TO 
\POINTER 1 AND 
\STORE IN 
\POINTER2 

\ ERE EEE EERE HREE 
\SUBTRACT INT’*S 
\A BYTE AT A 
\TIME (4 BYTES) 
\KEEPING TRACK 
\OF CARRY FLAG 
\AT COMPLETION 


\ BRANCH NOSWOF 
\IF CARRY SET 


\SET SWOP FLAG 

\ EHR EERE 
\SWOP INTEGERS 
\A BYTE AT A 
\TIME 


\ EE EE EEE 
\ INCREMENT 
\CYCLE 

\ HEHE EEE HHH 
\COMPARE CYCLE 


\TO NUMBER. IF 
\<> BRANCH TO 
\ INNERLOOP 


\IF SW.FLG CLEAR 
\BR. FLAGCLEAR 

\ HEHE HEH 
\DECREMENT 
\NUMBER 


Program 6.7. contd 
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BCS SKIPS 

DEC NUMBER+1 
. SKIPS 

LDA NUMBER 
BNE OUTERLOOP 
LDA NUMBER+1 
BNE OUTERLOOP 
- FLAGCLEAR 
Riss: J 

NEXT PASS 


810 
820 
B30 
840 
850 
860 
870 
B80 
B90 
900 
710 
920 
930 
940 
950 
960 
970 
980 
990 


\ FEE HE HE HE FE IE HE EE EE IE EE 
\COMPARE NUMBER 
\TO ZERO 

\BRANCH IF <> TO 
\OUTERLOOP 


REM BASIC IS FOR TESTING ONLY 
MODE4 

CLS 

INPUT"NUMBER OF INTEGERS ",NUMBERZ 
PRINT 

DIM ARRAYZ (NUMBERZ) 

FOR NZ=1 TO NUMBERZ 
ARRAYZ (NZ) =RND (10000) 


1000 
19010 
1020 
19030 
1040 
1050 
1060 
1070 
1080 
1090 
1100 
1110 


PRINT ARRAYZ (NZ) 
NEXT 
PRINT 
PRINT 
PRIN 
STARTZ=T IME 

CALL SORT, NUMBERZ, ARRAYZ (1) 
timeZ=TIME-STARTZ% 

FOR NZ=1 TO NUMBERZ 

PRINT ARRAYZ (NZ) 

NEXT 

PRINT 


"SORTING" 


1120 PRINT"SORTING TIME= 
SECONDS” 


"$time%/1005" 


Program 6.7. contd 


Although the listing is supplied with outline remarks, here is a more 
detailed breakdown: 
Lines 30-40 set up the zero-page labelled locations referred to in 
assembly code. 
Lines 90-220 obtain the address of the BASIC variable NUMBER% from 
the CALL parameter block and temporarily store it in CYCLE. Using 
indirect indexed addressing, the data is picked up, decremented by | and 
stored in NUMBER and NUMBER+1I (low-byte and high-byte re- 
spectively). Note, that zero-page locations must be used for indirect 
indexed addressing. 
Lines 240-270 initialise the swop FLAG and CYCLE counter to zero. 
Lines 280-310 store the address of the first integer in the array 
temporarily in POINTER2. This is picked up from the CALL parameter 
block. 
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Lines 320-360 copy POINTER2 contents into POINTERI. 

Lines 370-410 add 4 to POINTERI and store the result in POINTER2. 
The reason why 4 is added is so that POINTER2 is the address of the next 
integer in the array: that is to say, 4 bytes onwards. 

Line 430 initialises the byte counter to 4. The X register is used for this (see 
below). 

Lines 450-520 subtract the integers with carry, a byte at a time, keeping the 
result of the fourth bytes in the accumulator. Notice that the X register and 
DEX is used for byte counting. This is because a CPY instruction would 
corrupt the carry flag during the subtraction processes. If the carry flag is 
set at the end of the subtraction, no swop is required. This method only 
works for unsigned integers. 

Line 530 sets the byte counter for the swop process. DEY is used to set this 
to 3 since the current value of Y is 4. 

Line 540 stores the Y register contents as a swop flag in FLAG (any non- 
zero quantity would do here). 

Lines 560 to 630 swop the integers, a byte at a time, starting with the high- 
byte. Notice that the X register is used as a temporary storage location as 
this is the most economical in execution time since TAX uses only two 
machine cycles, whereas the alternatives PHA or STA require 3 cycles. 
Lines 650-680 increment the CYCLE counter by |. The coding given is an 
economical way to do this in terms of execution time. 

Lines 690-740 compare the low-byte of CYCLE and NUMBER. If the result is 
non-zero, a branch is made to the label INNERLOOP. Ifthe result is zero, the 
program ‘falls through’ to compare the high-byte in the same manner. 
Lines 750-760 check if the swop FLAG is clear. If so, a branch to 
FLAGCLEAR is made. 

Lines 770-820 decrement NUMBER by | (2 bytes, of course). 

Lines 840 to 870 check if NUMBER has reached zero, first checking the 
low-byte with branching to OUTERLOOP if not true, otherwise ‘falling 
through’ to compare the high-byte. 

Lines 920-1120 are pure BASIC just for test and familiarisation purposes. 
They first print out a random integer array, call the machine code routine 
and then print out the sorted result together with the time taken. See Table 
6.1 at the end of this chapter for a compendium of timing data. 


Bubble sort of a BASIC string array 


This routine is capable of sorting a BASIC string array, where the string 
elements can vary in length up to the legal maximum of 255. 


How string arrays are stored 
When a String array is set up by the interpreter, four bytes are used ina 
similar manner to integers. These bytes, are not the strings themselves but 
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OBTAIN ADDRESSES AND LENGTH OF STRINGS 
FROM STRING INFORMATION BLOCK. 
STORE STRINGS 1 AND 2 (2 BYTES EACH) AND 
LENGTH 1 AND 2(1 BYTE EACH) 






INITIALISE CHARACTER COUNTER TO ZERO 


COMPARE CURRENT STRING 1 AND STRING 2 
ASCII CODES 





YES 

















ASCII CODES 
EQUAL 


INCREMENT CHARACTER COUNTER 


COMPARE CHARACTER COUNTER TO LENGTH 1 
(1 BYTE) 


COMPARE CHARACTER COUNTER TO LENGTH 2 
(1 BYTE) 


SET SWAP FLAG 


SWAP STRING INFORMATION BLOCK A BYTE AT 
A TIME (4 BYTES) 


Fig. 6.3. Block 7 expansion for string array sort. 
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length details and the address of where the string is actually stored. These 
four bytes are referred to as a String Information Block, the details of 
which follow: 


ilteonnnmmarenicinsin: TACT CARINE TCTOOLG cece 


! : maximum | high-byte low-byte : 
| length ; length 
: Start address : 


The actual string, consisting of the ASCII codes in sequential memory 
locations, is stored from the starting address given in bytes | and 2 above. A 
string array is formed by a series of such string information blocks, stored 
sequentially in memory. Therefore, if we want to swop strings (such as 
during a string sort) it is necessary only to swop the string information 
blocks since these tell the system where the strings are stored. This makes 
programs involving machine code sorting much easier to program since 
most of the work has already been done by the interpreter. Byte 3 of the 
String Information Block gives the maximum number of characters 
allowed before discarding original reserved space. This data is rarely used 
in practice, except in ‘housekeeping’ or garbage disposal programs and 
consequently is of no present interest to us. Byte 4 of the String 
Information Block gives the actual length of the string in bytes (characters). 

The CALL parameter block is first set up in BASIC and requires the 
following variables in the CALL statement: 


CALL SORT, NUMBER%, ARRAY$ (1) 


The above call is in the same form as that of the previous call, except that 
ARRAYS$ (1) is used. 

The flowchart is essentially the same as Fig. 6.1 with one proviso, the 
details of block 7. The flowchart, showing the amendment is given in Fig. 
6.3. The listing is given in Program 6.2. 


10 REM STRING ARRAY BUBBLE SORT 

20 REM WITH VARIOUS LENGTH STRINGS 
30 NUMBER=&70: CYCLE=&72: POINTERI=&74 
40 POINTER2Z=%76: FLAG=&78: string1i=k79 
20 string2=%7B: length1=k7D 

60 length2=%&7E 


Program 6.2. String array bubble sort. 


790 

80 

90 
100 
110 
120 
130 
140 
150 
160 
170 
180 
190 
200 
210 
220 
250 
240 
200 
260 
270 
280 
290 
300 
310 
320 
330 
3540 
300 
3560 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
200 
3190 
220 
330 
740 
390 
360 
370 


DIM SORT S00 
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FOR PASS=0 TO 2 STEP 2 


PZ=SORT 

COPT PASS 

LDA #0601 

STA CYCLE 

LDA &0602 

STA CYCLE+1 

LDY #1 

LDA (CYCLE),Y 
STA NUMBER+1 

DEY 

SEC 

LDA (CYCLE),Y 
SBC #1 

STA NUMBER 

BCS OUTERLOOP 
DEC NUMBER+1 

. QUTERLOOP 

LDA #0 

STA FLAG 

STA CYCLE 

STA CYCLE+1 

LDA &0604 

STA POINTER2 

LDA &0605 

STA POINTER2+1 

. INNERLOOP 

LDA POINTER2+1 
STA POINTERI+1 
LDA POINTER2 
STA POINTERI 

CLC 

ADC #4 

STA POINTER2 

BCC SKIP 

INC POINTER2+1 

. SKIP 

LDY #0 

LDA (POINTERI),Y 
STA stringl 

LDA (POINTER2),Y 
STA string2 

INY 

LDA (POINTERI),Y 
STA stringitl 
LDA (POINTER2),Y 
STA string2tl 
LDY #3 

LDA (POINTERI),Y 
STA lengthi 


\ FEE EE IE HE HE HE EE EE HE HEE 
\GET NUMBER OF 
\BASIC STRINGS 
\IN ARRAY AND 
\STORE IN NUMBER 


ee 
\DECREMENT 
\NUMBER 


\ FE SEE FEE TE EE IEE HE EE 
NINITIALISE 
\SWOP FLAG AND 
\CYCLE TO ZERO 

\ FEE ETE HE TE HEHE EEE HE EEE 
\STORE START 
\NADDRESS OF 

\$ INF BLOCK IN 
\POINTER2 TEMP. 
\ ERE EEHEEEEEEEEEE 
\ TRANSFER 
\POINTERS 
\POINTER1I= 
\POINTER2 

\ EEE EEE 
\ADD 4 TO 
\POINTER1 AND 
\STORE IN 
\POINTER2 

\ EERE EEEKEEHEE 
\NOBTAIN ADDRESS 
\AND LENGTH OF 
\EACH OF PAIR 
\OF STRINGS 


Program 6.2. contd 
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380 
3970 
600 
610 
620 
630 
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LDA (POINTER2),Y 
STA Llength2 


640 
650 
560 
670 
480 
590 
700 
710 
720 
730 
740 
750 
760 
770 
780 
790 
800 
810 
820 
830 
B40 
850 
860 
870 
BBO 
890 
700 
910 
920 
930 
940 
950 
960 
970 
980 
990 

1000 

1010 

1020 

1030 

1040 

1050 

1060 

1070 

1080 


LDY #0 ee es ee 
. COMPLOOP \COMPARE STRINGS 
LDA (string2),Y \A CHARACTER AT 
CMP (stringi),Y \A TIME IF 

BCC SWOP \NECESSARY 

BNE NOSWOP 

INY 

CPY lengthil 

BEQ NOSWOP 

CPY length2 

BE@Q SWOP 

BNE COMPLOOP 

\, HEHE FEE TE FEE HE EE EE EEE HE SEE EE EE EE HEE HE HEE HE 


- STAGE \OUT OF RANGE 

BNE OQUTERLOOP \BRANCH PATCH 

\ HEHEHE HEHEHE EHH HEHEHE HEE EEE EEE EEHHEE 
- SWOP 

LDY #3 

STY FLAG \SET SWOP FLAG 

. SWOPLOOP 
LDA(POINTERI),Y \SWOP STRING 


TAX 


\ INFORMATION 


LDA(POINTER2),Y \BLOCK A BYTE AT 
STA(POINTERI),Y \A TIME(4 BYTES) 
TXA 

STA (POINTERZ2) , Y 

DEY 

BPL SWOPLOOP 

. NOSWOP \ HER RMEE RE KH HHH HE 
INC CYCLE \ INCREMENT 

BNE SKIP2 \CYCLE 

INC CYCLE+1 

- Sk I P2 \ RHR HEREKRHRHRRKRHEE 
LDA CYCLE \COMPARE CYCLE 
CMP NUMBER \TO NUMBER 

BNE INNERLOOP \IF <> BRANCH 

LDA CYCLE+1 \TO INNERLOOP 

CMP NUMBER+1 

BNE IWWNERLOOP 

LDA FLAG \IF SW.FLG CLEAR 
BEQ FLAGCLEAR \BR. FLAGCLEAR 

i DA NUMBER ‘“ SPHERE REHREREHRHEE 
SEC \DECREMENT 

SBC #1 \NUMBER 

STA NUMBER 

BCS SKIPS 

DEC NUMBER+1 

. SKIPS 

LDA NUMBER \ REE HEHEREREKERER 


Program 6.2. contd 


1090 
1100 
1110 
1120 
1130 
1140 
1150 
1160 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1240 
1250 
1260 
1270 
1280 
1290 
1300 
1210 
1320 
1330 
1340 
1350 
1360 
1370 
1380 
1390 
1400 
1410 
1420 
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\IF NUMBER <>0 
\BRANCH OUTERLOOP 
\VIA STAGE 


BNE STAGE 
LDA NUMBER+1 
BNE STAGE 

- FLAGCLEAR 
RTS: J] 

NEXT PASS 


REM BASIC IS FOR TESTING ONLY 

CLS 

MODE4 

INPUT"NUMBER OF STRINGS ",NUMBERZ 
PRINT 

DIM ARRAYS (NUMBER) 

FOR NZ=1 TO NUMBERZ 

strings="" 

FOR ZZ=1 TO RND(10) 
KS=CHRS (RND (26) +64) 
stringsS=string$+t+ks 

NEXT Z% 

ARRAYS (NZ) =strings 

PRINT ARRAYS (NZ) 

NEXT NZ 

PRINT 
PRINT 
PRINT 
STARTZ=T IME 

CALL SORT, NUMBERZ, ARRAYS (1) 
timeZ=TIME-STARTZ 

FOR NZ=1 TO NUMBERZ% 

PRINT ARRAYS (NZ) 

NEXT 

PRINT: PRINT "“NUMBER="5 NUMBERZ 
PRINT 

PRINT"SORTING TIME= "3timeZ/100; " 


"SORTING" 
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SECONDS" 
Program 6.2. contd 


All that is now required is to explain the differences between Program 
6.2 and Program 6.1. In this latter routine, POINTERI and POINTER2 
are the address pointers to the String Information Blocks of the pair of 
strings. A further level of indirect indexed addressing is necessary to pick up 
the actual string characters since the String Information Block supplies 
only the address of where the string is stored. 


Lines 460-540 pick up the start addresses of the pair of strings, using 
indirect indexed addressing. The addresses are stored in string! and 
string2 (zero-page locations). 

Lines 550-590 pick up the fourth bytes of both the String Information 
Block and store them in length! and length2. 

Line 600 sets the character counter to zero (Y register). 
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Lines 610-710 compare the ASCII codes of the pair of strings picked up by 
indirect indexed addressing. On comparison, as soon as the ASCII codes 
are found to be in descending order, the strings are immediately swopped. 
If the ASCII codes are found to be in ascending order, then no swop is 
required. 

Line 660 increments the character counter. 

Lines 670-680 compare the length of the first string (lengthl) to the 
character counter. If they are equal, no swop is required. 

Lines 690-700 compare the length of the second string (length2) to the 
character counter. If they are equal, a swop is initiated. 

Line 710 forces the branch to COMPLOOP which compares the next 
characters in the pair of strings, and so on. 

Lines 730-740 are an out-of-range branch patch and, ideally, should not be 
there. It is due to the 128 byte displacement limit in relative addressing. 
This occurs in Line 1110. 


Merge sort of BASIC integer array 


Although the bubble sort routines given earlier are fast for small numbers 
of elements, the execution time increases alarmingly when in excess of 
about a hundred elements. To see the delay on high numbers, try running 
Program 6.1 with 1000 integers. You could well wait 45 seconds before the 
sort is completed. A far better solution is to use a merge sort algorithm. We 
noted earlier that the bubble sort 1s fairly efficient if only a small number of 
elements are to be sorted. We also noted that the use of a swop flag system 
significantly speeds up the execution of a bubble sort of a roughly ordered 
list. The merge sort algorithm takes advantage of both these virtues. 
Essentially, the array to be sorted is split up into small sets which are bubble 
sorted. These are merged to form larger sets which will be roughly in order. 
These larger sets are bubble sorted and merged to form even larger sets and 
so on until we are left with one large, roughly ordered list. This is finally 
bubble sorted, which will be efficient due to the points made earlier. A 
flowchart for a version of a merge sort is given in Fig. 6.4. 

There is a danger of becoming intoxicated with verbosity in an attempt 
to explain the intricate details of a merge sort. A better grasp of the 
principles can be obtained bya Trace Table. In fact, any program which is 
difficult to follow will benefit from such an analysis. The idea is to follow 
the program through with arbitrary test data, keeping track of what 
happens to the various ‘key’ locations such as loop counters, etc. A Trace 
Table for the flowchart is given in Fig. 6.5. The unsorted array uses 8 
random integers and shows how they would be sorted at various stages of 
the trace. Notice how the array becomes more and more ordered as each 
outer loop cycle is completed. 

If-the flowchart and trace table have been understood, Program 6.3 
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GET NUMBER OF BASIC ARRAY INTEGERS FROM 
1 CALL PARAMETER BLOCK. 
STORE IN NUMBER (2 BYTES 


FIND NEXT POWER OF 2 > = NUMBER. STORE IN 
2 SIZE (2 BYTES). STORE RESPECTIVE INDEX IN 
POWER (1 BYTE) 


3 DIVIDE SIZE BY 2 (2 BYTES) 

* STORE NUMBER - SIZE IN CYCLES (2 BYTES) 

: SET LOOPCOUNT TO ZERO (2 BYTES) 
SET SWAP FLAG TO ZERO (1 BYTE) 


OBTAIN START ADDRESS OF INT ARRAY FROM 
: CALL PARAMETER BLOCK AND STORE IN 
POINTER 1(2 BYTES) 


MULTIPLY SIZE BY 4(2 BYTES) THEN ADD TO 
7 POINTER 1 AND STORE RESULT IN POINTER 2 
(2 BYTES) 


COMPARE INTEGERS 


8 
| 

YES 
IN ORDER 

| 

| 

| 

| 

| 





ADD 4 TO BOTH POINTER 1 and POINTER 2 
(2 BYTES) 


9 
40 INCREMENT LOOPCOUNT (2 BYTES) 


11 


LOOPCOUNT 
a a CYCLES > DECREMENT POWER (1 BYTE) 
(2 BYTES) 
12 
SWAP NO 
FLAG CLEAR POWER = ZERO 
NO (1 BYTE) 


13 DECREMENT CYCLES (2 BYTES) RTS 


14 
CYCLES = ZERO 
2 BYTES 


Fig. 6.4. Merge sort of an integer array. 


should be reasonably easy to decipher. In this program, there is provision 
for sorting signed integers. An expansion of Block 8 of the flowchart (Fig. 
6.4) is given in Fig. 6.6, showing the extra details required. 
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TRACE TABLE FOR A MERGE SORT 


LOCATION LABELS Value after Ist Value after 2nd Value after 3rd 
outer loop completed | outer loop completed | outer loop completed 


NUMBER 





Unsorted array Array after Ist Array after 2nd Array after 3rd 
outer loop completed | outer loop completed | outer loop completed 









\ 1 








Fig. 6.5. Simple Trace Table of merge sort algorithm. 
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SUBTRACT ist INTEGER BYTE 
FROM 2nd INTEGER BYTE 
DECREMENT BYTE COUNTER 
BYTE COUNTER 
= ZERO 





OVERFLO NO 
LAG SET 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 







YES 
POSITIVE 


Fig. 6.6. Expansion of Block 8 in Fig. 6.4. 


The program breakdown of Program 6.3 is as follows: 


130 Advanced Machine Code Techniques for The BBC Micro 


PLIST 

10 REM MERGE SORT OF ARRAY OF 

20 REM SIGNED 32bit INTEGERS 

30 NUMBER=&70: POINTERI=&72: POWER=&74 

40 POINTER2=&75: SI ZE=&77: LOOPCOUNT=&7 
9: FLAG=&7B: STORE=&7C: CYCLES=&7E 

YO DIM SORT S00 

60 FOR PASS=0 TO 2 STEP 2 


7O PZ=SORT 

80 COPT PASS \ HEHEHE HE EE HEE HE EE EE 
970 LDA &0601 \GET NUMBER OF 
100 STA STORE \BASIC INTEGERS 
110 LDA &0602 \IN ARRAY AND 
120 STA STORE+1 \STORE IN NUMBER 
130 LDY #0 

140 STY SIZE+1 

150 STY POWER NALSO INITIALISE 
160 LDA (STORE),Y \SIZE AND POWER 
170 STA NUMBER 

180 INY 


190 STY SIZE 
200 LDA (STORE) ,Y 
210 STA NUMBER+1 


220 .SIZELOOP \ EEE EE EE HEHEHE HE HEH 
230 INC POWER \FIND NEXT POWER 
240 CLC \OF 2 >= NUMBER 
290 ASL SIZE \STORE IN SIZE 
260 ROL SIZE+1 

2/70 SEC 


280 LDA SIZE 

290 SBC NUMBER 
300 LDA SIZE+1 
510 SBC NUMBER+1 
3520 BCC SIZELOOP 


330 .OUTERLOOP \ HEE EEE EEE 
340 CLC \DIVIDE SIZE 

3950 LSR SIZE+1 \BY 2 

360 ROR SIZE 

370 SEC \ FOIE TEE EEE HE HE EEE EE 
580 LDA NUMBER \SUBTRACT SIZE 
390 SBC SIZE \FROM NUMBER 

400 STA CYCLES \STORE IN CYCLES 


410 LDA NUMBER+1 
420 SBC SIZE+1 
430 STA CYCLES+1 


440 .MIDLOOP \ EKER ERRKRER BHR HE 
450 LDA #0 | \NINITIALISE 

460 STA FLAG \SWOP FLAG AND 
470 STA LOOPCOUNT \LOOPCOUNT 

480 STA LOOPCOUNT+1 

490 LDA &0604 \ ERR ERR REE EMH ERE 


Program 6.3. Merge sort of an array of signed 32-bit integers. 


500 
510 
520 
530 
540 
550 
560 
570 
580 
590 
600 
610 
620 
630 
640 
650 
460 
670 
680 
690 
700 
710 
720 
730 
740 
750 
760 
770 
780 
790 
800 
810 
820 
B30 
840 
850 
860 
870 
880 
B90 
900 
910 
920 
930 
940 
950 
960 
970 
980 
990 

1000 


STA 
LDA 
STA 
LDA 
STA 
LDA 


POINTERI 
&0605 
POINTERI+1 
SIZE 
STORE 
SIZE+1 

STA STORE+1 
LDX #2 

-MULT4 

CLC 
ASL 
ROL 
DEX 
BNE 
CLC 
LDA 
ADC 
STA 
LDA 


STORE 
STORE+1 


MULT 4 


POINTER! 
STORE 
POINTER2 
POINTERI+1 
ADC STORE+1 

STA POINTER2+1 

. INNERLOOP 

LDX #4 

LDY #0 

SEC 

. COMPLOOP 

LDA (POINTER2),Y 
SBC (POINTER1),Y 
INY 

DEX 

BNE COMPLOOP 

BVC NOOVERFLOW 
EOR #&80 

. NOOVERFLOW 

TAX 

BPL NOSWOP 

DEY 

STY FLAG 

. SWOPLOOP 

LDA (POINTER1I),Y 
TAX 
LDA 
STA 
TXA 
STA 
DEY 
BPL 
BMI 


(POINTER2) ,Y 
(POINTERI),Y 


(POINTER2),Y 


SWOPLOOP 
NOSWOP 


. STAGE1I 
BNE MIDLOOP 
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\STORE FIRST INT 
\ADDRESS IN 
\POINTER1I 

\ FE HE EAE TE TE HE HE HEE HE HE EEE 
\MULTIPLY SIZE 
\BY 4 AND ADD 
\ADD TO POINTER] 
\STORE RESULT IN 
\POINTER2 


\ HEHE EEE EE HE HEHEHE HEE 
\SUBTRACT ONE 
\INTEGER FROM 
\THE OTHER AND 
\KEEP THE MOST 
\SIGNIF ICANT 
\BYTE OF RESULT 
\IN ACCUMULATOR 
\FOR SIGN BIT 


\REVERSE SIGN 
\BIT IF OVERFLOW 
\OCCURS. 

\UPDATE N FLAG 
\N CLEAR, NOSWOP 


\ FE IE FE IE HE FETE FE IE TE IE FE IE EE 
\SWOP INTEGERS 
\A BYTE AT A 
\TIME 


\OUT OF RANGE 
\BRANCH PATCHES 


Program 6.3. contd 
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1010 
1020 
1030 
1040 
1050 
1060 
1070 
1080 
1090 
1100 
1110 
1120 
1130 
1140 
1150 
11460 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1240 
1250 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1330 
1340 
1350 
1360 
1370 
1380 
1390 
1400 
1410 
1420 
1430 
1440 
1450 
1460 
1470 
1480 
1490 
1500 
1510 


- STAGE 
BNE OUTERLOOP 


. NOSWOP 
INC LOOPCOUNT 
BNE SKIP 

INC LOOPCOUNT+1 
. SKIP 

LDA POINTERI 
CLC 

ADC #4 

STA POINTERI 
BCC SKIP2 

INC POINTERiI+1 
. SKIP2 

LDA POINTER2 
CLC 

ADC #4 

STA POINTER2 
BCC SKIPS 

INC POINTER2+1 
. SKIPS 

LDA CYCLES 
CMP LOOPCOUNT 
BNE INNERLOOP 
LDA CYCLES+1 
CMP LOOPCOUNT+1 
BNE INNERLOOP 
LDA FLAG 

BEQ FLAGCLEAR 
SEC 

LDA CYCLES 
SBC #1 

STA CYCLES 
BCS SKIP4 

DEC CYCLES+1 
.SKIP4 

LDA CYCLES 
BNE STAGE1 

LDA CYCLES+1 
BNE STAGE1 

. FLAGCLEAR 
DEC POWER 

BNE STAGE2 
RTS: ] 

NEXT PASS 
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\ FE FE SE HE FE HE TE JE HE HE HE SE SE EE 
\ INCREMENT 
\LOOPCOUNT 


\ FE FE FE HE HE HE HE HE HE HE HEHE HE HE HE 
\ADD 4 TO 
\POINTER!1 


\ FEE EE EEE EEE HE HEE EE 
\ADD 4 TO 
\POINTER2 


\ FEE TE TE IEE HE HEE HE HE HEE HE HE 
\COMPARE 
\LOOPCOUNT 

\TO CYCLE 

\IF <> BRANCH 
\TO INNERLOOP 


\IF SW.FLG CLEAR 
\BR. FLAGCLEAR 


\ FEE FEE TE HE HE HE HEE EE HE HEE 
\DECREMENT 
\CYCLES 


\, FEE FE FEE SE SE SE SE FE IE HE SEE 
\IF CYCLES <>0 
\THEN BRANCH 

\TO MIDLOOP 

\VIA STAGE1 

\ HEHEHE EE HE HEHEHE HE 
\DECREMENT POWER 
\IF>O BR. STAGE2 


REM BASIC TEST PROGRAM 


MODE4 
CLS 


INPUT"NUMBER OF INTEGERS ",NUMBERZ 


Program 6.3. contd 





Sort Routines 133 


1520 PRINT 

1530 DIM ARRAYZ (NUMBERZ) 
1540 FOR NZ=1 TO NUMBERZ 
i550 ARRAYZ (NZ) =RND/ 1000 
1560 PRINT ARRAYZ(NZ) 
1570 NEXT 

i580 PRINT 

1590 PRINT "SORTING" 
1600 PRINT 

1610 STARTAZ=TIME 

1620 CALL SORT, NUMBERZ, ARRAY%(1) 
1630 timeZ=TIME-STARTZ% 
1640 FOR NZ=1 TO NUMBERZ 
1650 PRINT ARRAYZ(NZ) 
1660 NEXT NZ 


1670 PRINT 

1680 PRINT"NUMBER= "5; NUMBERZ 

1690 PRINT 

1700 PRINT"SORTING TIME= "StimeZ/100;5 " 
SECONDS" 


Program 6.3. contd 


Lines 90-120 obtain the address of the BASIC variable NUMBER% from 
the CALL parameter block. The address is placed in zero-page locations 
STORE (2 bytes). 

Lines 130-210 pick up the data by indirect indexed addressing and store it in 
NUMBER (zero-page). At the same time, various locations are initialised. 
Lines 220-320 are a block of,code dedicated to finding the next power of 2, 
>= the contents of NUMBER. The result is stored in SIZE (2 bytes). The 
corresponding power index is stored in POWER (1 byte). The op-codes 
ASL and ROL in conjunction are convenient for 2-byte manipulation of 
powers of 2. 

Lines 330-360 divide SIZE by 2 by shifting right. 

Lines 370-430 subtract SIZE from NUMBER and store the result in 
CYCLES (2 bytes). 

Lines 450-480 initialise LOOPCOUNT (2 bytes) and the swop FLAG to 
zero. 

Lines 490-520 pick up the address of the first element in the array and store 
itin POINTER] (2 bytes). 

Lines 530-700 are a block of code which multiplies SIZE by 4 and adds 
POINTERI, storing the result in POINTER2. The reason why it is 
multiplied by 4 is because each integer occupies four bytes. The 
multiplication is achieved by shifting left twice. 

Line 720 sets the byte counter (X register) to 4. 

Lines 730-800 subtract the two integers picked up by indirect indexed 
addressing, keeping the result of the most significant byte (which has the 
sign bit). 

Line 810 checks if the V flag is set. If clear, it skips line 820. 





134 Advanced Machine Code Techniques for The BBC Micro 


Line 820 assumes that the V flag 1s set, so reverses the sign bit. 

Line 840 updates the N flag to the accumulator contents. This is necessary 
because DEX, in line 790, corrupts the N flag. TAX is economical for this 
purpose since it uses only 2 cycles. 

Line 850 tests the sign of the accumulator and by-passes the swop loop if 
positive. This ensures that if both integers are the same, no swopping 
Occurs. 

Lines 890-960 swop the integers, a byte at a time, using indirect indexed 
addressing. 

Line 970 serves no useful purpose in the program, other than causing 
a by-pass of the out of range branch-patch section (lines 990-1020). 

Lines 1050-1080 increment LOOPCOUNT. 

Lines 1090-1220 add the usual 4 to each of POINTERI and 
POINTER2. 

Lines 1230-1280 compare LOOPCOUNT to CYCLES, branching to 
INNERLOOP if not equal. 

Lines 1290-1300 test the swop flag and, if clear, branch to FLAGCLEAR. 
Lines 1310-1370 decrement CYCLES. 

Lines 1380-1410 compare CYCLES to zero, branching to MIDLOOP via 
STAGE1, the out of range branch-patch. 

Lines 1430-1440 decrement POWER and compare to zero, branching to 
OUTERLOOP via STAGE 2. 

Lines 1480-1700 handle the BASIC test routine described earlier. 


Merge sort of a BASIC string array. 


The overall structure of Program 6.4 is similar to Program 6.3, the only 
difference being the substitution of the string comparison routine (block 8 
in the flowchart). This routine has already been described in detail for the 
bubble sort. 


LIST 

10 REM MERGE SORT OF STRING ARRAY 

20 REM WITH VARIOUS LENGTH STRINGS 

30 NUMBER=%&70: POINTER1I=&72: POWER=&74 

40 POINTER2=&75: SI ZE=&77: LOOPCOUNT=&7 
9: FLAG=%7B: STORE=&7C: CYCLES=&7E 

20 string1=%80: string2=k&82: lengthi=&8 
4: length2=%85 

60 DIM SORT 500 

70 FOR PASS=0 TO 2 STEP 2 

80 PZ=SORT 


90 COPT PASS \ FEE ETE HE IE HEE EE EE HE EE 
100 LDA &0601 \GET NUMBER OF 
110 STA STORE \BASIC STRINGS 


Program 6.4. Merge sort of a BASIC string array. 


120 
130 
140 
150 
160 
170 
180 
190 
200 
210 
220 
250 
240 
20 
260 
270 
280 
2970 
300 
310 
320 
330 
340 
350 
3560 
379 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
4680 
490 
200 
3710 
220 
330 
740 
IO 
360 
370 
380 
370 
600 
610 
620 


LDA 
STA 
LDY 
STY 
STY 
LDA 
STA 
INY 
STY 
LDA 
STA 


%&0602 
STORE+1 
#0 

SIZE+1 
POWER 
(STORE) ,Y 
NUMBER 


SIZE 
(STORE) ,Y 
NUMBER+1 


. SIZELOOP 


INC 
CLC 
ASL 
ROL 
SEC 
LDA 
SBC 
LDA 
SBC 
BCC 


POWER 


SIZE 
SIZE+1 


SIZE 
NUMBER 
SIZE+1 
NUMBER+1 
SIZELOOP 


- OUTERLOOP 


CLC 
LSR 
ROR 
SEC 
LDA 
SBC 
STA 
LDA 
SBC 
STA 


SIZE+1 
SIZE 


NUMBER 
SIZE 
CYCLES 
NUMBER+ 1 
SIZE+1 
CYCLES+1 


- MIDLOOP 


LDA 
STA 
STA 
STA 
LDA 
STA 


LDA 


STA 
LDA 
STA 
LDA 
STA 
LDX 


#0 

FLAG 
LOOPCOUNT 
LOOPCOUNT +1 
&0604 
POINTER! 
&0605 
POINTERI+1 
SIZE 
STORE 
SIZE+1 
STORE+1 

#2 


- MULT4 


CLC 
ASL 
ROL 


STORE 
STORE+1 
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\IN ARRAY AND 
\STORE IN NUMBER 


\NALSO INITIALISE 
\SIZE AND POWER 


\ HEFCE EEE EEE HEE EE 
\FIND NEXT POWER 
\OF 2 >= NUMBER 
\AND STORE IN 
\SIZE 


\ FEE IE HE HE HE TEE TE EE HE HE HEHE 
\DIVIDE SIZE 
\BY 2 


\ FE FEE IE HE HE HEE EE HEE HE HE 
\SUBTRACT SIZE 

\FROM NUMBER AND 
\STORE IN CYCLES 


\ FE HE FETE TE IE HE TE HEE IE HE HE HE HE 
\SET SWOP FLAG 
\AND LOOPCOUNT 
\TO ZERO 

\ HEE EEE EE HEE 
\STORE START 
\NADDRESS OF 

\ $ INF.BLOCK IN 
\POINTERI 

\, HE HE HEE TE ETE HE JE HE HE HE IE HEHE 
\MULTIPLY SIZE 
\BY 4 AND ADD TO 
\POINTER1 STORE 
\RESULT IN 
\POINTER2 


Program 6.4. contd 
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630 
640 
650 
660 
670 
680 
690 
7Q0 
710 
720 
730 
740 
750 
760 
770 
780 
790 
800 
810 
820 
830 
840 
850 
860 
870 
880 
B90 
900 
910 
920 
930 
940 
930 
960 
9790 
980 
990 
1000 
1010 
1020 
1030 
1040 
1050 
1060 
1070 
1080 
1090 
1100 
1110 
1120 
1130 


DEX 
BNE 
CLC 
LDA 
ADC 
STA 
LDA 


MULT4 


POINTER! 
STORE 
POINTER2 
POINTERI+1 
ADC STORE+1 

STA POINTER2+1 

- INNERLOOP 

LDY #0 

LDA (POINTERI),Y 
STA stringl 

LDA (POINTER2),Y 
STA string2 

INY 
LDA 
STA 
LDA 
STA 
LDY 
LDA 
STA 
LDA 


(POINTER1I),Y 
stringitl 
(POINTER2),Y 
string2+1 

#3 
(POINTERI),Y 
lengthil 
(POINTER2) ,¥ 
STA length2 

LDY #0 

- COMPLOOP 

LDA (string2),Y 
CMP (string1i),Y 
BCC SWOP 

BNE NOSWOP 

INY 
ory 
BED 
CPY 
BEQ 
BNE 


lengthl 
NOSWOP 
length2 
SWOP 
COMPLOOP 


. STAGE1 

BNE MIDLOOP 

- STAGE2 

BNE OUTERLOOP 


LDY #3 
STY FLAG 

. SWOPLOOP 

LDA (POINTER1),Y 
TAX 

LDA (POINTER2),Y 
STA (POINTERI),Y 


\ HERE IE HE HEE HEHE HE HEHEHE HHH 
\OBTAIN ADDRESS 
\AND LENGTH 
\OF EACH OF 
\PAIR OF STRINGS 


\ FE FETE FETE HE IEICE HEHEHE HE HEE 
\COMPARE STRINGS 
\A CHARACTER AT 
\A TIME IF 
\NECESSARY 


\OUT OF RANGE 
\BRANCH PATCHES 


\ FEE FEE HE EEE HEHEHE HEHEHE 
\SET SWOP FLAG 


\ FEE IE HE HE HEE EE EEE 
\SWOP STRING 
\ INFORMATION 
\BLOCK A BYTE AT 
\A TIME(4 BYTES) 


Program 6.4. contd 


1140 
1150 
1160 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1240 
1250 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1330 
1340 
1350 
1360 
1370 
1380 
1390 
1400 
1410 
1420 
1430 
1440 
1450 
1460 
1470 
1480 
1490 
1500 
i510 
1520 
1530 
1540 
1550 
1560 
1570 
1580 
1590 
1600 
1610 
1620 
1630 
1640 


TXA 

STA (POINTER2),Y 
DEY 

BPL SWOPLOOP 

. NOSWOP 

INC LOOPCOUNT 
BNE SKIP 

INC LOOPCOUNT+1 
- SKIP 

LDA POINTERI 
CLC 

ADC #4 

STA POINTER! 
BCC SKIP2 

INC POINTERI+1 
- SKIP2 

LDA POINTER2 
CLC 

ADC #4 

STA POINTER2 
BCC SKIPS 

INC POINTER2+1 
- SKIPS 

LDA CYCLES 
CMP LOOPCOUNT 
BNE INNERLOOP 
LDA CYCLES+1 
CMP LOOPCOUNT+1 
BNE INNERLOOP 
LDA FLAG 

BEQ@ FLAGCLEAR 
SEC 

LDA CYCLES 
SBC #1 

STA CYCLES 
BCS SKIP4 

DEC CYCLES+1 

- SKIP4 

LDA CYCLES 
BNE STAGE1 

LDA CYCLES+1 
BNE STAGEI 

- FLAGCLEAR 
DEC POWER 

BNE STAGE2 
RTS: J 

NEXT PASS 


Sort Routines 


\ BREE EERE HEE EEE 
\ INCREMENT 
\LOOPCOUNT 


ERE RHEE RR 
\ADD 4 TO 
\POINTERI 


\ FETE FETE HE SE HE ETE HE HE HE EEE 
\ADD 4 TO 
\POINTER2 


\ EEE EEE HH EH 
\COMPARE 
\LOOPCOUNT TO 
\CYCLES 

\NIF <> BRANCH 
\TO INNERLOOP 


\IF SW.FLG CLEAR 
\BR. FLAGCLEAR 


\ EEE EHH HEE HEE 
\DECREMENT 
\CYCLES 


ee ee 
\IF CYCLES <+>0 
\THEN BRANCH 

\TO MIDLOOP 

\VIA STAGEI1 

\ FETE EE HE HEHE EEE HE HEE 
\DECREMENT POWER 
\IF>O BR.STAGE2 


REM BASIC TEST PROGRAM 


MODE4 
CLS 


Program 6.4. contd 
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1650 
1660 
1670 
1680 
1690 
1700 
1710 
1720 
1730 
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INPUT"NUMBER OF STRINGS ",NUMBERZ 
PRINT 

DIM ARRAYS (NUMBERZ) 

FOR NZ=1 TO NUMBER? 

strings="" 

FOR ZZ=1 TO RND(10) 
K$=CHRS (RND (26) +64) 
stringt=string$tks$ 

NEXT ZZ 


1740 ARRAYS (NZ) =strings 
1750 PRINT ARRAYS (NZ) 
1760 NEXT NZ 

1770 PRINT 

1780 PRINT "SORTING" 
1790 PRINT 

1800 STARTZ=TIME 


1810 
1820 
1830 
1840 
1850 
1860 
1870 
1880 
1890 


CALL SORT, NUMBERZ, ARRAYS (1) 
timeZ=TIME-STARTZ 

FOR NZ=1 TO NUMBERZ 

PRINT ARRAYS (NZ) 

NEXT 

PRINT 
PRINT"STRINGS= 
PRINT 
PRINT"SORTING TIME= 


"3 NUMBERZ 


"stimeZ/1005 " 


SECONDS" 


Program 6.4. contd 


Since the overall structure is similar to the previous program, a detailed 
line by line analysis is unnecessary. 


Merge sort of an unsigned BASIC floating point array 


Programs which handle records or tables often need to sort unsigned 
floating point numbers (most numerical values in such programs are 
unsigned). It is, therefore, important to have at least an outline 
understanding of how floating numbers are stored by the BASIC 
interpreter. A floating point number consists of a mantissa and an 
exponent. Four bytes are allocated to the mantissa and one byte for the 
exponent. The most significant bit of the exponent is the sign bit and is in 
reverse two’s complement. That is to say, a negative exponent has ‘0’ as the 
sign bit; a ‘I’ indicates a positive exponent. The reason for this rather 
strange practice is that the maximum possible negative exponent closely 
approaches zero. This means that zero can be loosely taken as the most 
negative exponent. Therefore less negative exponents through to positive 
exponents correspond to a progression of increasingly larger exponents. 

From a mathematical viewpoint, a mantissa is always positive so no sign 
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bit is required. Therefore, the sign bit in the mantissa is used to denote the 
sign of the entire number in conventional two’s complement form. 


How floating point variables are stored 
Floating point variables are stored by the interpreter in a five-byte form, 
the details of which are below: 


[J $$ —————————_sincreasing Memory ———_»> 


Exponent Mantissa (4 bytes) 





COMPARE EXPONENTS OF FIRST FP NUMBER TO 
THAT OF SECOND FP NUMBER (1 BYTE) 
iN 
YES DESCENDING 
ORDER 


EXPONENTS THE NO 
SAME 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| THAT OF SECOND NUMBER 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 







! 
DESCENDING 
ORDER 
SET SWAP FLAG 
SWAP FP NUMBERS A BYTE AT A TIME 
(5 BYTES) 


Fig. 6.7. Block 8 expansion for unsigned floating point merge sort. 
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Block 8 of the flowchart in Fig. 6.4 is replaced by Fig. 6.7. This is the only 
essential difference in the algorithm other than the addition of 5 instead of 4 
to the address pointers POINTER! and POINTER2 when appropriate. 
The complete listing is shown in Program 6.5. and requires the following 
BASIC parameters: 


CALL SORT, NUMBER%, ARRAY (1) 


The variables used are arbitrary but must be in the order given. The listing 1s 
liberally ‘remarked’ and, hopefully, should require no further explanation. 


>LIST 
10 REM MERGE SORT OF UNSIGNED 
20 REM FLOATING POINT NUMBERS 


30 NUMBER=&%70: POINTER1=&72: POWER=&74 

40 POINTER2=%75: SI ZE=&77: LOOPCOUNT=&7 
9: FLAG=&7B: STORE=&7C: CYCLES=&7E 

30 DIM SORT 3500 

60 FOR PASS=0 TO 2 STEP 2 

7O PZ=SORT 


BO CORT PASS \ HEE HEHRHKRRHEHREREEE 
90 LDA &0601 \GET NUMBER OF 
100 STA STORE \FP NUMBERS 

110 LDA &0602 \IN ARRAY AND 
120 STA STORE+1 \STORE IN NUMBER 
iZO LDY #0 

i140 STY SIZE+1 

i50 STY POWER \ALSO INITIALISE 
160 LDA (STORE),Y \SIZE AND POWER 
170 STA NUMBER 

180 INY 

i970 STY SIZE 

200 LDA (STORE),Y 

7210 STA NUMBER+1 
2270 -ol Z7ELOOP \ HERE ERHREHEEEEREE 
2320 INC POWER \FIND NEXT POWER 
240 ASL SIZE \OF 2 >= NUMBER 
250 ROL SIZE+1 \AND STORE IN 
260 SEC \SIZE 

270 LDA SIZE 

7280 SBC NUMBER 

290 LDA SIZE+1 

300 SBC NUMBER+1 

210 BCC SIZELOOP 

rs @) . QUTERLGOP \SREEHRHEKEHKERHEE 
330 LSR SIZE+1 \DIVIDE SIZE 

340 ROR SIZE \BY 2 

350 SEC \ HERR EEEERRKERHEEE 
360 LDA NUMBER \SUBTRACT SIZE 
370 SBC SIZE \FROM NUMBER AND 
380 STA CYCLES \STORE IN CYCLES 


Program 6.5. Merge sort of an unsigned floating point array. 


390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
JOO 
31Q 
320 
var i © 
340 
uO 
360 
u3/7O 
380 
2770 
600 
610 
620 
630 
640 
650 
740 
735Q 
760 
779 
780 
79O 
BOO 
810 
B20 
B30 
B40 
850 
8460 
870 
B80 
890 
900 
F719 
920 
FS0 
940 
FaO 
960 
¥7O 


iDA NUMBER+1 
SBC SIZE+1 

STA CYCLES+1 

. MIDLOOP 

LDA #0 

STA FLAG 

STA LOOPCOUNT 
STA LOOPCOUNT+1 
LDA &0604 

STA POINTERI 
LDA &0605 

STA FPOINTERI+1 
LDA SIZE+1 

STA STORE+1 
LDA SIZE 

ASL A 

ROL STORE+1 
ASL A 

ROL STORE+1 
oe 
ADC 
ADC 
STA 
LDA 
ADC 


SIZE 
POINTER1 
POINTER2 
STORE+1 
SIZE+1 

ADC POINTERi+1 
STA POINTER2+1 

. INNERLOOP 

LDY #0 

LDA (POINTER2),Y 
CMP (POINTERI).Y 
BCC sSwoP 

BNE NOSWOP 

LDY #4 

SEC 

. COMPLOOP 

LDA (POINTER2),Y 
SBC (POINTERI).Y 
DEY 

BNE COMPLOOP 

BCS NOSWOP 

. SWOP 

LDY #4 

STY FLAG 

. SWOPLOOP 

LDA (POINTER1),Y 
TAX 
LDA 
STA 
TXA 
STA 


(POINTER?) ,Y 
(POINTERI),Y 


(POINTER?) ,Y 
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\ FE EE IEE E EE FEE HEHEHE HEHE 
NINITIALISE 
\SWOP FLAG AND 
\LOOPCOUNT 

\ FEE EEE HEE HEHE HH 
\STORE START 
\ADDRESS OF 

\FP ARRAY IN 
\POINTER!I 

\ FEE HEE EEE HEHE HEHE HE 
\MULTIPLY SIZE 
\BY 4 AND ADD 
\SIZE THUS 
\OBTAINING AN 
\EFFECTIVE 
\MULTIPLICATION 
\OF SIZE BY 93 
\STORE RESULT 
\IN POINTER2 


\ HEHE HEHE HE HE HE HEHE HE HEHEHE HEE 
\COMPARE 
\EXPONENT OF 
\PAIR OF FP 
\NUMBERS 


\ JERE TE FETE FETE EEE EE HE HEE 
\SUBTRACT 
\MANTISSI 

\A BYTE AT A 
\TIME KEEPING 
\TRACK OF THE 
\CARRY FLAG 

\AT COMPLETION 

\ EHR 
\SET SWOP FLAG 


\ HEHE EEE EEE 
\SWOP FP NUMBERS 
\A BYTE AT A 
\NTIME(S BYTES) 


Program 6.5. contd 
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980 

990 
1000 
1010 
1020 
1030 
1040 
1050 
1060 
1070 
1080 
1090 
1100 
1110 
1120 
1130 
1140 
1150 
1160 
1170 
1180 
1190 
1200 
1210 
1220 
1250 
1240 

20 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1320 
1340 
1350 
1360 
1370 
1380 
i390 
1400 
1410 
1420 
1430 
1440 
1450 
1460 
1470 
1480 





DEY 
RPL SWOPLOOP 
BMI NOSWOP 


. STAGE 1 

BNE MIDLOOP 

- STAGE? 

BNE OUTERLOOP 


. NOSWOP 

INC LOOPCOUNT 
BNE SKIPi 

INC LOOPCOUNT+1 
.SKIP1 

LDA POINTERI 
ELE 

ADC #5 

STA POINTERI 
BCC SKIP2 

INC POINTERI+1 
.SKIP2 

LDA FPOINTER2 
cLec 

ADC #5 

STA POINTER2 
BCC SKIPS 

INC POINTER2+1 
. SKIPS 

LDA CYCLES 
CMP LOOPCOUNT 
BNE INNERLOOP 
LDA CYCLES+1 
CMP LOOPCOUNT+1 
BNE INNERLOOP 
LDA FLAG 

BEG FLAGCLEAR 
SEC 

LDA CYCLES 
SBC #1 

STA CYCLES 
BCS SKIP4 

DEC CYCLES+1 
. SKIF4 

LDA CYCLES 
BNE STAGE1 
LDA CYCLES+1 
BNE STAGE1 

. FLAGCLEAR 
DEC POWER 

BNE STAGE? 
RTS: J 
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\OUT OF RANGE 
\ BRANCH PATCHES 


\ EEE 
\ INCREMENT 
\LOOPCOUNT 


\ EEE EEE 
\ADD 3S TO 
\POINTER!I 


\ FEE IEE EE EE HEE EEE HE 
\ADD 3S TO 
\POINTER2 


\ REE HEHEHE EERE 
\ COMPARE 
\LOOPCOUNT TO 
\CYCLE 

\IF <> BRANCH 
\TO INNERLOOP 


\IF SW.FLG CLEAR 
\BR.FLAGCLEAR 


\ FEE EEE EEE 
\DECREMENT 
\CYCLES 


\ REE EERE EEE EEE 
\IF CYCLES <-0 

\ THEN BRANCH 

\TO MIDLOOP 

\VIA STAGE! 

\ HEE EEE 
\DECREMENT POWER 
\IFS>O BR.STAGE? 


Program 6.5. contd 
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1490 NEXT PASS 

13500 

i510 REM BASIC TEST FPROGRAM 

i520 MODE4 

foo CLS 

1540 INPUT"NUMBER OF FP ELEMENTS ",NUMB 
ERZ 

1550 PRINT 

i560 DIM ARRAY (NUMBER) 

i370 FOR NAZ=1 TO NUMBERZA 

1580 ARRAY (NZ) =ABRS(RND) *1E-9 

i590 PRINT ARRAY (NZ) 

1600 NEXT 

1610 PRINT 

1620 PRINT "SORTING" 

1620 PRINT 

1640 STARTA=TIME. 

1650 CALL SORT, NUMBERZ, ARRAY (1) 

1660 timeZ=TIME-STARTZ 

1670 FOR NZ=1 TO NUMBERZ 

1680 PRINT ARRAY (NZ) 

1490 NEXT 

1700 PRINT 

1710 PRINT"SORTING TIME= "stime#/100; " 

SECONDS" 

1720 PRINT 


1720 PRINT"NUMBER= "3; NUMBERZ 


Program 6.5. contd 


Machine code sort routines applied to BASIC multifield filing 
programs 


There are two common methods of generating multifield records in BASIC. 
One is to use fields of fixed length substrings where the whole record is 
stored as one string array element. The other is to create a two-dimensional 
string array where the records occupy one dimension and the fields occupy 
the other. This is often referred to as the row/column file format. Both have 
advantages and disadvantages. The former method is more economical 
when storing records since only one String Information block is set up per 
record by the interpreter. On the other hand, the BASIC programming can 
be tedious and expensive on memory. The latter method makes for concise 
programming in BASIC but is heavy on String Information blocks (the 
number of fields multiplied by the number of records). It is a matter of 
personal preference which method is used, so a machine code merge sort 
routine to handle each type of record format will be given. The requirement 
of any routine of this type is to sort entire records according to any specified 
field, therefore additional calling parameters will be necessary. 
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Merge sort of multifield fixed-length records 

The complete source code listing and BASIC test routine are given in 
Program 6.6. The overall structure is similar to that of previous routines 
with the extra coding ‘remarked’ on the listing. The machine code call is 
executed from BASIC via the call statement: 


CALL SORT, NUMBER%, FIELD%, FIELDEND%, ARRAY3(1) 
where: 


SORT=the start address of the routine. 

NUMBER%the number of records in the array. 

FIELD%=the first character position of the field in the string 
array element. 

FIELDEND%=the last character position of the field in the string 
array element. 

ARRAYS$ (1)=the first usable element in the array. 


By convention, the zero element is reserved for headings, labels, etc. 


IST 
10 REM MERGE SORT OF MULTIFIELD 


20 REM FIXED LENGTH RECORDS 

30 NUMBER=%70: POINTER1=&72: POWER=&74 

40 POINTER2=&75: SI ZE=&77: LOOPCOUNT=&7 
9: FLAG=%&7B: STORE=&7C: CYCLES=&7E 

290 string1=&80: string2=%82: FIELD=&84: 
FIELDEND=&85 

60 DIM SORT S00 

7O FOR PASS=0 TO 2 STEP 2 


80 P%=SORT 
90 [COPT PASS BERRA HERR HHH 
100 LDA &0601 \GET NUMBER OF 
110 STA STORE \RECORDS IN THE 
120 LDA &0602 \BASIC ARRAY AND 
i130 STA STORE+1 \STORE IN NUMBER 
140 LDY #0 

i50 STY SIZE+1 
160 STY POWER \ALSO INITIALISE 
170 LDA (STORE),Y \SIZE AND POWER 
i980 STA NUMBER 

190 INY 


200 STY SIZE 
210 LDA (STORE),Y 
220 STA NUMBER+1 


250 LDA &0604 \ PEPE SEE HE EEE IEE EE HEHE HE 
240 STA STORE \STORE FIELD 

290 LDA &0605 \START POSITION 
260 STA STORE+1 \IN FIELD 

270 DEY 
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280 
290 
300 
310 
320 
330 
340 
350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
500 
510 
520 
530 
540 
550 
560 
570 


3970 
600 
610 
620 
630 
640 
650 
660 
670 
6380 
590 
700 
710 
720 
730 
740 
750 
760 
770 
780 


LDA 
STA 
LDA 
STA 
LDA 
STA 
LDA 
STA 


(STORE) ,Y 
FIELD 
&0607 
STORE 
%0608 
STORE+1 
(STORE) ,Y 
F IELDEND 


- SIZELOOP 


INC 
CLC 
ASL 
ROL 
SEC 
LDA 
SBC 
LDA 
SBC 
BCC 


POWER 


SIZE 
SIZE+1 


SIZE 
NUMBER 
SIZE+1 
NUMBER+ 1 
SIZELOOP 


.- OUTERLOOP 


CLC 
LSR 
ROR 
SEC 
LDA 
SBC 
STA 
LDA 
SBC 
STA 


SIZE+1 
SIZE 


NUMBER 
SIZE 
CYCLES 
NUMBER+1 
SIZE+1 
CYCLES+1 


- MIDLOOP 


LDA 
STA 
STA 
STA 
LDA 
STA 
LDA 
STA 
LDA 
STA 
LDA 
STA 
i DX 


#0 

FLAG 
LOOPCOUNT 
LOOPCOUNT+1 
&060A 
POINTERI 
&060B 
POINTERiI+1 
SIZE 
STORE 
SIZE+1 
STORE+1 

#2 


- MULT4 


CLC 
ASL 
ROL 
DEX 
BNE 
CLC 


STORE 
STORE+1 


MULT4 
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\ EERE EE EERE 
\STORE FIELD 
\NEND POSITION 
\IN FIELDEND 


\\ FETE ETE EEE IEEE FEE HE HE 
\FIND NEXT POWER 


\OF 2 >= NUMBER 
\AND STORE IN 
\SIZE 


\ EEE EE EEE EE EEE 
\DIVIDE SIZE 
\BY 2 


\, FE HE FE HE HE HE FEE HEHEHE HE HE HEE 
\SUBTRACT SIZE 

\FROM NUMBER AND 
\STORE IN CYCLES 


\ 2636 FE IE SE HE HE TE HE ETE HE HE HEE 
\SET SWOP FLAG 
\NAND LOOPCOUNT 
\TO ZERO 

\ HEE FEE TE IE SE HE TEE TEE HEHE HE 
\STORE START 
\NADDRESS OF 

\ $% INF.BLOCK IN 
\POINTER!1 

\ FEE HEHEHE HE HEE EE HE HE HE HE HE 
\MULTIPLY SIZE 
\BY 4 AND ADD TO 
\POINTER1 STORE 
\RESULT IN 
\POINTER2 


Program 6.6. contd 
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790 LDA POINTER! 
B00 ADC STORE 

810 STA POINTER2 
820 LDA POINTERI+1 
830 ADC STORE+1 
840 STA POINTER2+1 


850 .INNERLOOP \ HEHEHE EERE 
860 LDY #0 \OBTAIN ADDRESS 
870 LDA (POINTERI),Y \OF EACH OF 

880 STA stringi \PAIR OF STRINGS 


890 LDA (POINTER2),Y 
900 STA string2 

910 INY 

920 LDA (POINTERI),Y 
930 STA stringitl 
9740 LDA (POINTER2) ,Y 
950 STA string2tl 


960 LDY FIELD \NINIT.Yreg FIELD 
970 DEY 
980 .COMPLOOP \COMPARE FIELDS 


990 LDA (string2),Y \A CHARACTER AT 
1000 CMP (stringli),Y \A TIME IF 

1010 BCC SWOP \NECESSARY 

1020 BNE NOSWOP 

1030 INY 

1940 CPY FIELDEND 

1050 BNE COMPLOOP 

1060 BEQ NOSWOP 


1070 \--------------------------------- 
1080 .STAGE1 \GUT OF RANGE 
1090 BNE MIDLOOP \BRANCH PATCHES 


1100 .STAGE2 
1110 BNE OUTERLOOP 


1120 \---------------------- 
1130 .SWOP \ HEHEHE ERE EEE HEH 
1140 LDY #3 \SET SWOP FLAG 
1150 STY FLAG 

1160 .SWOPLOOP \REEMREMERRKREHE BES 
1170 LDA (POINTERI),Y \SWOP STRING 
1180 TAX \ INFORMATION 


1190 LDA (POINTER2),Y \BLOCK A BYTE AT 
1200 STA (POINTERI),Y \A TIME(4 BYTES) 
1210 TXA 

1270 STA (POINTER2),Y 

i230 DEY 

1240 BPL SWOPLOOP 

1250 .NOSWOP SME ERR ER EER EERE E 
1260 INC LOOPCOUNT \ INCREMENT 

1270 BNE SKIP \LOOPCOUNT 


1280 INC LOGPCOUNT+1 
1290 .SKIP VERARHR ERE HERE HE 


Program 6.6. contd 


1300 
1310 
1320 
1330 
1340 
1350 
1360 
1370 
1380 
1390 
1400 
1410 
1420 
1430 
1440 
1450 
1460 
1470 
14680 
1490 
1500 
1510 
1520 
1530 
1540 
1550 
1560 
1570 
1580 
1590 
1600 
1610 
1620 
1630 
1640 
1650 
1660 
1670 
1680 
1690 
1700 
1710 
1720 
1730 
1740 
1750 
1760 
1770 
1780 
1790 
1800 


LDA 
CLC 
ADC 
STA 
BCC 
INC 


POINTER1 


#4 
POINTERI 
SKIP2 
POINTERI+1 


»SKIP2 


i_DA 
CLC 
ADC 
STA 
BCC 
INC 


POINTER2 


#4 
POINTER2 
SKIPS 
POINTER2+1 


- SKIPS 


LDA 
CMP 
BNE 
LDA 
CMP 
BNE 
LDA 
BEQ 
SEC 
LDA 
SBC 
STA 
BCS 
DEC 


CYCLES 
LOOPCOUNT 
INNERLOOP 
CYCLES+1 
LOOPCOUNT +1 
INNERLOOP 
FLAG 
FLAGCLEAR 


CYCLES 
#1 
CYCLES 
SKIP4 
CYCLES+1 


- SKIP4 


LDA 
BNE 
LDA 
BNE 


CYCLES 
STAGE1 
CYCLES+1 
STAGE 1 


- FLAGCLEAR 


DEC 
BNE 


RTS: 


POWER 
STAGE2 
J 


NEXT PASS 
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\ADD 4 TO 
\POINTERI 


\ EEE EHE HEHEHE EEE 
\NADD 4 TO 
\POINTER2 


\ HEHEHE ETE TE EE EEE HE EE 
\COMPARE 
\LOOPCOUNT TO 
\CYCLES 

\IF <> BRANCH 
\TO INNERLOOP 


\IF SW.FLG CLEAR 
\BR. FLAGCLEAR 


\ HEFCE HE HE HE HEHE EEE HE HEE 
\DECREMENT 
\CYCLES 


\ EEE EEE EEE EH 
\IF CYCLES <>0 
\THEN BRANCH 

\TO MIDLOOP 

\VIA STAGE! 

\ HEHEHE EHH EH 
\DECREMENT POWER 
\IF>O BR.STAGE2 


REM BASIC TEST ROUTINE 
MODE4 


CLS 


INPUT"NUMBER OF STRINGS ",NUMBERZ 
DIM ARRAYS (NUMBERZ) 
FOR NZ=1 TO NUMBERZ 
strings="" 


FOR 


ZZ=1 TO 10 


K$=CHRS (RND (26) +64) 


string$=stringStkKs$ 


NEXT ZZ 


ARRAYS (NZ) =strings 


Program 6.6. contd 
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1810 PRINT ARRAYS (NZ) 

1820 NEXT Nx 

1830 INPUT"GIVE FIELD START POS. ",FIE 
LD% 

1840 INPUT"GIVE FIELD END POS. " FIE 
LDEND% 

1850 PRINT "SORTING" 

1960 START%=TIME 

1870 CALL SORT, NUMBER%, FIELD%, FIELDEND~% 
, ARRAYS (1) 

1880 time%Z=TIME-START~% 

1890 FOR N%=1 TO NUMBERZ 

1900 PRINT ARRAYS (NZ) 

1910 NEXT N% 

1920 PRINT"SORTED FIELD BEGINS AT CHARA 


CTER "SFIELDZ 

1930 PRINT"SORTED FIELD ENDS WITH CHARA 
CTER "“;FIELDENDZ 

1940 PRINT”"NUMBER OF RECORDS= ";NUMBERZ 


1950 PRINT"SORTING TIME= "5 timeZ/100;”" 
SECONDS" 


Program 6.6. contd 


Merge sort of a two-dimensional string array 

The interpreter stores the String Information blocks corresponding to 
multidimensional string array elements sequentially in memory. The 
following series shows the order in which they occur for a two-dimensional 
string array: 


A$ (0,0) , A$ (0,1) , A$ (0,2) , A$ (1,0) , AS (1,1) , A$ (1,2) ...... A$ (R,C) 
If a file is DI Mensioned in BASIC: 
ARRAY$ (ROWNUM%, COLNUM%) 


then a file ARRAY$ can be considered as containing ROWNUM% records 
and COLNUM% fields. 
We can define a specific field of a specific record by: 


ARRAY$ (RECORD%, FIELD%) 
If we call the sort routine with 
CALL SORT, ROWNUM%, COLNUM%, FIELD%, ARRAY (1,0) 


all the necessary parameters required to sort all the records by field are 
passed. The complete source code listing and BASIC test routine are shown 
in Program 6.7. 
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Since each String Information block occupies four bytes and there are 
COLNUM&%-+1 fields to each record, the sort routine will need to calculate 
the number of bytes necessary before swopping the SI blocks 
corresponding to each record. This is performed by lines 380 to 480 and the 
result is stored in NUMBYTE (1 byte). 

The by now familar SI block address pointers POINTERI and 
POINTER? refer to the zeroth dimension of COLNUM%soan offset needs 
to be calculated by the routine to point to the required sort field element SI 
block in the COLNUM% dimension (remember that 4 bytes offset is 
required for each). This is performed by lines 490 to 560 and the result is 
stored in offset (1 byte). Using indirect indexed addressing, the offset 
accesses the required field SI block position. Plainly, we will need to add 
NUMBYYTE instead of 4 to the previous SI block address pointers in order 
to access the next record. Apart from these differences, the overall 
structure is similar to that previously described. 

The routine can sort records with up to a maximum of 128/4=32 fields 
(not much of a handicap in practice). Program 6.7 has been well-tried and 
tested in a practical filing system and will sort a computer full or records in 
less than a second. 


>LIST 
10 
20 
50 
40 
rar @ 
60 


MERGE SORT OF A TWO 
DIMENSIONAL STRING ARRAY 
row/column record format 
sorting entire record (row) 
REM according to any specified 
REM field (column) 

70 NUMBER=%70: POINTER1=&72: POWER=&74 

80 POINTER2Z=%75: SI ZE=&77: LOOPCOUNT=&7 
9: FLAG=&7B: STORE=&7C: CYCLES=&7E 

90 string1=%80: string2=&82: length1=&8 
4: length2=%85: NUMBY TE=&86: of f set =&87 


REM 
REM 
REM 
REM 


100 DIM SORT 500 

110 FOR PASS=0 TO 2 STEP 2 

120 PZ=SORT 

130 COPT PASS \ HEHEHE IE EE HE HE HE HE HE HEE HEE 
140 LDA &0601 \GET NUMBER OF 
150 STA STORE \BASIC STRINGS 
160 LDA &0602 NIN ARRAY AND 
170 STA STORE+1 \STORE IN NUMBER 
1860 LDY #0 

190 STY SIZE+1 

200 STY POWER \NALSO INITIALISE 
210 LDA (STORE),Y \SIZE AND POWER 
220 STA NUMBER 

230 INY 

240 STY SIZE 

250 LDA (STORE) ,Y 


Program 6.7. Merge sort of a two-dimensional string array. 
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260 
270 
280 
290 
300 
310 
320 
330 
340 
330 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
430 
490 
300 
2710 
320 
230 
340 
UO 
260 
370 
380 
390 
600 
610 
620 
630 
640 
650 
660 
670 
680 
690 


7190 
720 
730 
740 
750 
760 


STA 


NUMBER+1 


. SIZELOOP 


INC 
CLC 
ASL 
ROL 
SEC 
_DA 
SBC 
LDA 
SBC 
BCC 
LDA 
STA 
LDA 
STA 
LDY 
LDA 
CLC 
ADC 
ASL 
ASL 
STA 
LDA 
STA 
LDA 
STA 
LDA 
ASL 
ASL 
STA 


POWER 


SIZE 
SIZE+1 


SIZE 
NUMBER 
SIZE+1 
NUMBER+ 1 
S1ZELOOP 
%&0604 
STORE 
&0605 
STORE+1 
#0 
(STORE) ,Y 


#1 

Ba 

oy 
NUMBYTE 
&0607 
STORE 
&0608 
STORE+1 
(STORE), Y 
ee 
offset 


.- OUTERLOOP 


CLC 
LSR 
ROR 
SEC 
LDA 
SBC 
STA 
LDA 
SBC 
STA 


SIZE+1 
SIZE 


NUMBER 
SIZE 
CYCLES 
NUMBER+1 
SIZE+1 
CYCLES+1 


. MIDLOOP 


LDA 
STA 
STA 
STA 
LDA 
STA 
LDA 


STA POINTERI+1 


#0 
FLAG 


LOOPCOUNT 
LOOPCOUNT+1 


&060A 
POINTER! 
%&060B 


\ FEE HEHE HE HE HEE TE HE HEHE HE HE HE 
\FIND NEXT POWER 
\OF 2 >= NUMBER 
\AND STORE IN 
\SIZE 


\ HEHEHE IE IE HE HE HEE EE HE HE HE 
\GET NUMBER OF 
\FIELDS IN 
\RECORD THEN 
\ADD 1 AND 
\MULTIPLY BY 4 
\STORE RESULT 
\IN NUMBYTE 


\ EEE HERR R EEE 
\GET SORT FIELD 
\NUMBER THEN 
\MULTIPLY BY 4 
\STORE RESULT IN 
\of fset 


\ 26 HE HEHE HEE EE EEE HE HE HE HE 
\DIVIDE SIZE 
\BY 2 


\ 26 FE FE HE HE EE EEE HE HEE HE HE 
\SUBTRACT SIZE 

\FROM NUMBER AND 
\STORE IN CYCLES 


\ FE FE HEHE HE HEHEHE HE HEHE HE HE HE HE 
\SET SWOP FLAG 
\AND LOOPCOUNT 
\TO ZERO 

\ FETE TEE HE HE EEE EEE EE 
\STORE START 
\NADDRESS OF 

\ % INF.BLOCK IN 
\POINTER!1 


Program 6.7. contd 


7790 LDA 
780 STA 
7990 STA 
800 LDX 


#0 
STORE 
STORE+1 
NUMBY TE 


810 .MULTLOOP 


820 CLC 
830 LDA 
B40 ADC 
850 STA 
860 LDA 
870 ADC 
880 STA 
890 DEX 
900 BNE 
910 CLC 
920 LDA 
930 ADC 
940 STA 
950 LDA 
960 ADC 
970 STA 


STORE 
SIZE 
STORE 
STORE+1 
Size+i 
STORE+1 


MUL TLOOP 


POINTERI 
STORE 
POINTER2 
POINTERI+1 
STORE+1 
POINTER2+1 


9780 . INNERLOOP 


9790 LDY 
1000 LDA 
1010 STA 
1020 LDA 
1030 STA 
1040 INY 
1050 LDA 
1060 STA 
1070 LDA 
1080 STA 
1090 INY 
1100 INY 
1110 LDA 
1120_STA 
1130 LDA 
1140 STA 
1150 LDY 


offset 
(POINTERI),Y 
stringl 
(POINTER2) ,Y 
string2 


(POINTERI),Y 
stringitl 
(POINTER2),Y 
string2t+li 


(POINTERI),Y 
lengthi 
(POINTER2),Y 
length2 

#0 


1160 .COMPLOOP 


1170 LDA 
1180 CMP 
1190 BCC 
1200 BNE 
1210 INY 
tzZ2y Gry 
i230 BEG 
1240 CPY 


(string2),Y 
(stringl1),Y 
SWOP 

NOSWOP 


lengthi 
NOSWOP 
length2 
SWOP 
COMPLOOP 


Sort Routines 


\ FEE EEE IE HEHE EEE EE HEE 
\MULTIPLY SIZE 
\BY NUMBYTE 

\NAND ADD TO 
\POINTERI 

\STORE RESULT IN 
\POINTER2 


\ EAE FE FE HE EE EEE HEE EH 
\OBTAIN ADDRESS 
\AND LENGTH 

\OF EACH OF 
\PAIR OF STRINGS 


\ AE FE FETE FE FE FE EE IE HE SE HEE HE 
\COMPARE STRINGS 
\A CHARACTER AT 
\A TIME IF 
\NECESSARY 


Program 6.7. contd 
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1280 
1290 
1300 
1310 
1320 
1330 
1340 
1350 
1360 
i370 
1380 
1390 
1400 
1410 
1420 
1430 
1440 
1450 
1460 
1470 
1480 
1490 
1500 
15190 
1520 
1530 
1540 
1550 
1560 
1570 
1580 
1590 
1600 
1610 
1620 
1630 
1640 
1650 
1660 
1670 
1680 
1690 
1700 
1710 
1720 
izs0 
1740 
1750 
1760 
1770 
1780 


- STAGE1I 

BNE MIDiLOOF 

. STAGE? 

BNE OUTERLOOP 


. SWOP 

LDY NUMBYTE 

DEY 

STY FLAG 

- SWOPLOOP 

LDA (POINTERI),Y 
TAX 
LDA 
STA 
TXA 
STA 
DEY 
BPL SWOPLOOP 

.- NOSWOP 

INC LOOPCOUNT 
BNE SKIP 

INC LOOPCOUNT+1 
. SKIP 

LDA POINTERI 
CLC 

ADC NUMBYTE 
STA POINTER! 
BCC SKIP2 

INC POINTERI+1 
. SKIP2 

LDA POINTER2 
CLC 

ADC NUMBYTE 
STA POINTER2 
BCC SKIPS 

INC POINTER2+1 
- SKIPS 

LDA CYCLES 

CMP LOOPCOUNT 
BNE INNERLOOP 
LDA CYCLES+1 
CMP LOOPCOUNT+1 
BNE INNERLOOP 
LDA FLAG 

BEQ FLAGCLEAR 
SEC 
LDA 
SBC 
STA 
BCS 
DEC 


(POINTERZ2),Y 
(POINTERI),Y 


(POINTER2) ,Y 


CYCLES 
#1 
CYCLES 
SKIP4 
CYCLES+1 
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\OUT OF RANGE 
\BRANCH. PATCHES 


NINITIALISE BYTE 
\COUNTER TO 
\NUMBYTE 

\SET SWOP FLAG 

\ EEE EEE EEE EE 
\SWOP STRING 

\ INFORMAT ION 
\BLOCKS A BYTE 
NAT A TIME TILL 
\COMPLETE RECORD 
\IS SWOPPED 


\ FEE EE HE FE IE HE HE HE HE HE HEE HE 
\ INCREMENT 
\LOOPCOUNT 


\ FEE ETE IE EEE EE HEE HEHEHE 
\ADD NUMBYTE TO 
\POINTER1I 


\ EEE EEE EE EH 
\ADD NUMBYTE TO 
\POINTER2 


\ HEHEHE EEE EEE 
\COMPARE 
\LOOPCOUNT TO 
\CYCLES 

\IF <> BRANCH 
\TO INNERLOOP 


\IF SW.FLG CLEAR 
\BR. FLAGCLEAR 


\ CHEER ERRERHEHHEE 


\DECREMENT 


\CYCLES 


Program 6.7. contd 
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1790 .SKIP4 ee ee 
i800 LDA CYCLES \IF CYCLES <>0 
1810 BNE STAGEI1 \THEN BRANCH 
1820 LDA’ CYCLES+1 \TO MIDLOOP 

1830 BNE STAGEI \VIA STAGEI 

1840 .FLAGCLEAR NER REE EE EEE EEE 
1850 DEC POWER \DECREMENT POWER 
1860 BNE STAGE2 \IF>0O BR.STAGE2 
1870 RTS: ] 

i880 NEXT PASS 

1890 

1900 REM BASIC TEST ROUTINE 

i910 MODE4 

i920 CLS 

1930 INPUT"“NUMBER OF RECORDS ",ROWNUMZ 
1940 INPUT"NUMBER OF COLUMNS ",COLNUMZ 
i950 INPUT"“SORT WHICH FIELD ",FIELDZ 
1960 PRINT 

1970 DIM ARRAYS (ROWNUMZ, COLNUMZ) 

1980 FOR R%Z=1 TO ROWNUMZ 

1990 FOR CZ=0 TO COLNUMZ-1 

2000 stringt="" 

2010 FOR ZZ=1 TO RND(10) 

2020 KS$=CHRS (RND (26) +64) 

2030 strings=string$tks 

2040 NEXT ZZ 

2050 ARRAYS(RZ~, CZ) =strings 

2060 PRINT ARRAYS (RZ,CZ) 

2070 NEXT CZ 

2080 PRINT 

2090 NEXT RZ 

2100 PRINT "SORTING" 

2110 PRINT 

2120 STARTZ=TIME 

2130 CALL SORT, ROWNUM%, COLNUMZ,FIELDZ,A 
RRAY$ (1,0) 

2140 timeZ=TIME-STARTZ 

2150 FOR RZ=1 TO ROWNUMZ 

2160 FOR CZ=0 TO COLNUMZ-1 

2170 PRINT ARRAYS (RZ~,CZ) 

2180 NEXT CZ 

2190 PRINT 

2200 NEXT RZ 

2210 PRINT 

2220 PRINT" RECORDS= "3; ROWNUMZ 

2230 PRINT 

2240 PRINT"SORTING TIME= “stimeZ/1005 " 
SECONDS" 


Program 6.7. contd 
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Table of sorting times 


As a guide to the execution times you can expect from the sorting routines, a 
comprehensive table is given below for various array sizes. 


Table 6.1: Table of sorting times. 





Unsigned integer bubble sort 





Number of elements 100 300 1000 2000 3000 
Time (secs) reverse order 0.7 5.7 64 253 567 
Time (secs) random (average) 0.5 4.3 47 187 418 





String bubble sort 





Number of elements 100 300 1000 
Time (secs) random length strings 9.5 4.3 50 








Signed integer merge sort 





Number of elements 100 300 1000 2000 3000 
Time (secs) reverse order 0.09 0.3 1.4 3 5 
Time (secs) random (average) 0.3 1.2 8.5 22 47 





String merge sort 





Number of elements 100 300 1000 
Time (secs) random length strings 0.2 1.2 8.5 





Unsigned floating point merge sort 





Number of elements 100 300 1000 2000 
Time (secs) random (average) 0.2 1.2 8.0 24 
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Multi-field fixed length records 


Number of records 100 200 300 
Time (secs) random 02 07. 12 


Multi-field unlimited length records (2-dimensional array) 


Number of records 100 200 300 
Time (secs) 3-field records Oe 12 hf 
Summary 


1. 


BASIC and machine code mix well in the BBC system. 


2. There is little point in using machine code just for its own sake. Use 


10. 


BASIC for routine tasks which are not time-critical and slip into 
machine code for those that are. 


. The standardised unit of computer speed is the ‘mip’, meaning millions 


(of machine code instructions) per second. The BBC machine has an 
average speed of 0.5 mips. 


. Aspeed of 0.5 mips is good for personal microcomputers but slow in 


comparison with mainframes. Some can achieve speeds of 100 mips. 


. Sorting data into numerical or alphabetical order occupies about 30% 


of computer time in the commercial and scientific fields. 


. The bubble sort is simple and effective providing the number of items 


to be sorted is moderate (not > 100). 


. Integer array variables occupy 4 bytes. Bit 7 of the highest order byte is 


used for distinguishing sign. 


. String arrays are handled by a four-byte String Information Block 


which gives string length, maximum length and start address of the 
array. 


. For sorting large arrays of data items, the bubble method is slow. A 


much faster sort is obtained by the ‘merge’ method. 

The merge sort still uses bubble techniques but first splits up the array 
into small sets. The sorted sets are then progressively merged into 
larger sets until a single sorted set remains. 
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Self test 


6.1 


6.2 


6.3 


6.4 
6.5 


Name at least five factors which you consider might influence the 
processing power of a computer. 

Develop a comprehensive merge sort program which will sort signed 
integers, strings or floating point arrays as specified by a CALL 
statement. 

Develop a comprehensive bubble sort program which will sort signed 
integers. strings or unsigned floating point numbers. 

Develop a routine which will sort signed floating point numbers. 
Develop a bubble sort routine which will sort signed integers. 


Chapter Seven 

Using Subroutines, 
Macros and Look-up 
Tables 





User subroutines 


Machine code programs which are called from, and intended to return to, 
BASIC via RTS are essentially subroutines. However, it is a common 
requirement for the machine code program itself to use subroutines, either 
user-designed or one of the many resident subroutines embedded within 
the operating system. Subroutines designed by the user are called by JSR 
followed by an operand, either an absolute machine address (not 
recommended) or a destination label. As in BASIC, machine code 
subroutines can be nested one within the other. Enthusiasm for high 
nesting levels should not be carried to excess or the stack could overflow. 
Each unreturned JSR uses up two stack locations, storing the two-byte 
return address in the Program Counter. No provision is made in the 6502 
for saving the other registers. It is up to the programmer to make provisions 
for protecting valuable register data from corruption by the subroutine. 
Subroutines are best avoided altogether within loops which are time- 
critical. Each JSR squanders 6 clock cycles and RTS another six. It is far 
better to splice the code within the main program, even if it means writing 
the same segment of code several times. 


Resident subroutines 


Acorn strongly advise that programs affecting the input/output devices 
(screen, keyboard, printer, etc.) do so via the appropriate resident 
subroutines. Circumvention, by writing your own, is not dangerous but the 
program may not work if a second processor is added. This need not be an 
effective deterrent. You may never consider owning the second processor. 
You may feel that the advantages of writing your own input/output 
routines outweigh other considerations. There are certain speed advan- 
tages to be gained by using direct screen addressing techniques. Some 
fascinating (often bizarre) graphic displays can be produced. Objects can 


158 Advanced Machine Code Techniques for The BBC Micro 


be persuaded to move across the screen with far less, in fact almost 
imperceptible, flicker. However, it should be realised that writing your own 
code for screen animation is not going to be easy. The resident subroutines 
are excellent but we should bear in mind that design constraints are 
inevitable during the development of general purpose software. In the case 
of the BBC system, additional constraints are imposed by the problem of 
maintaining compatibility with the Tube. It comes down in the end to a 
question of personal choice whether you regard or disregard Acorn’s 
warning. The choice is simple: use the resident subroutines and feel safe, or 
bow to the spirit of adventure and experiment with your own. The 
designers of the BBC operating system have bent over backwards to 
provide free access to most of the ROM’s internal anatomy. It would be 
hard to find a competitive machine which offers more scope for 
experimentation. 

It would be pointless at this stage to plod wearily through the entire 
repertoire of operating system calls. The complete list appears in the User 
Guide but, for convenience, has been repeated in slightly different form in 
Appendix B. 

The practical programs that appear throughout this chapter should be 
useful for insertion as subroutines in BASIC programs employing graphics 
and sound. 


Vectors and indirection 


One term which crops up when reading literature on operating system calls 
is vector so it is important to be certain of its meaning: 


A vector is a word (normally 2 bytes) in memory which contains the 


address of a particular routine. 





Detailed specifications of routines might include something like: 
OSRDCH: Calling address=&FFEO : Indirected through &0210 


The word ‘indirected’ can be read as ‘internally redirected’ and, in the 
above case, refers to the vector in address &0210. Thus, although we would 
call OSRDCH at &FFEO, the code at that address is not OSRDCH but 
simply information where OSRDCH can be found. 

Why all this apparently needless treasure trail? The answer lies in 
flexibility. There are three possibilities for the machine code programmer: 


(1) Intercept the standard operating system call by simply changing the 
vector (changing the address in &0210 in the above example). Calling 
OSRDCH now at address &FFEO would call up a different routine, 
written by the user. 


A-LIST 
10 
20 
sO 
40 
30 
60 
7Q 


BO 
90 


100 
110 
120 
130 
140 
150 
160 
170 
180 
190 
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REM READ TEXT CURSOR POSITION 
REM USING AN OSBYTE CALL 
MODES 

DIM START 256 

OSBYTE=&FFF4 

XPOS=&70: YPOS=&71 


FOR PASS=0 TO 2 STEP 2 

PZ=START 

COPT PASS 

LDA #&86 \READ CURSOR POSITION 
JSR OSBYTE 

STX XPOS \STORE X AND Y REG’S 
STY YPOS \IN ZERO PAGE LOC’S 
RTS: J 

NEXT PASS 

REM RANDOMISE CURSOR POSITION 


VDUS1,RND (24) ,RND(24) 
CALL START a 
PRINT" XPOS="5 ?XPOS3" YPOS="3 7YPOS5 ) 


Program 7.7. Read text cursor position. 


(2) Intercept as before but using some preliminary code at the vectored 
address to modify the normal call. The original routine could then be re- 


entered. 


(3) Operating system ROMs can be updated or modified without affecting 
the original call. All that needs to be changed are the contents of the vector. 


21ST 
10 
20 
30 
40 
JO 
60 
7O 
80 
90 
100 
110 
120 
130 
140 
150 
160 
179 
180 
190 
200 
210 
220 
230 


REM TYPING PRACTICE PROGRAM 
MODE6 

DIM START 256 

OSASC I=&FFE3: OSRDCH=&FFEO 


OSNEWL=&FFE7 : OSWRCH=&FFEE 

FOR PASS=0 TO 3 STEP 3 

PZ=START 

COPT PASS 

LDA #12 

JSR OSWRCH \CLEAR SCREEN 

- BEGIN 

JSR OSRDCH \NACC=ASCIICKEY HIT) 
CMP #ASC("*") \COMP TO "“*" ASCII 
BEQ FINISH ~§ \BR. FINISH IF = 
JSR OSASCI \SEND TO SCREEN 

BNE BEGIN \BR. BEGIN IF <> 
«FINISH 

LDA #7 \ACC=ASCII FOR BELL 
JSR OSWRCH \OUTPUT ACCUMULATOR 
JSR OSNEWL NEXIT WITH NEW LINE 
RTS: ] | 
NEXT PASS 

CALL START 


Program 7.2. Typing practice. 
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Not all the system calls are indirected. Some of those that are include 
OSWRCH, OSRDCH, OSCLI, OSBYTE, OSWORD. Nearly all the 
vectors are situated in page 2 although there are a few which can extend 
into page &0D when using ROM paging. Note carefully that page &0D is 
often referred to as the ‘user subroutine area’. 

Three examples, using the simpler routines are given, with outline 
explanation, in the three following programs. Program 7.1 reads the 
current text cursor position using OSBYTE. Program 7.2isa simple typing 
practice program using OSASCI, OSRDCH, OSNEWL and OSWRCH. 
To exit the program, enter *. 


Using OSWORD 


Program 7.3 is an example using the sound generators with envelope 


10 REM USING THE SOUND GENERATORS 
20 REM WITH ENVELOPE SHAPING 
30 REM (LASER GUN TYPE NOISE) 


50 PZ=&1E00 

60 FOR item=1 TO 22 

70 READ Ds 

80 D=EVAL (D#) 

90 ?P%=D: PZ=P%+1 

100 NEXT item 

110 

120 DATA 1,0,1,0,200,0,3,0 

130 DATA 1,1,0,—-4,0,0,50,0,42,&F0,&FE, 
&FE, 126,94 

140 

150 OSWORD=&FFF 1 

160 FOR PASS=O TO 2 STEP 2 

170 P%=%1DO0O 


180 

190 COPT PASS 

200 LDA #8 \CALL OSWORD WITH 8 
210 LDX #&08 \IN THE ACCUMULATOR 
220 LDY #&1E \AND ENV PARA BLOCK 
230 JSR OSWORD \ADDRESS IN X AND Y 
240 

290 LDA #7 \CALL OSWORD WITH 7 
260 LDX #&00 \IN THE ACCUMULATOR 
270 LDY #&1E \AND SOUND PARA BLOCK 
280 JSR OSWORD \ADDRESS IN X AND Y 
290 

300 RTS: J 


310 NEXT PASS 
320 CALL &1DOO 


Program 7.3. Using the sound generator with envelope shaping. 
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shaping via a pair of OSWORD calls. Lines 50 to 130 set up the respective 
parameter block data at &1E0QO (an arbitrary address). The sound 
parameter block data is given in line 120 and the envelope parameter block 
data in line 130. The data given produce a laser ‘zapping’ sound, used 
extensively in many ‘shoot out of the sky’ types of game. 

Readers will no doubt be aware that this is the assembler equivalent of 
the SOUND and ENVELOPE statements used in BASIC. Notice that two 
bytes are used for each of the four SOUND parameters which are channel, 
envelope number, pitch and duration. The high-bytes are usually zero, 
except for the volume/envelope number parameter which can take a 
negative value, thus requiring &FF as the high-byte. On the other hand, the 
14 envelope parameter block data items are all single-byte entities. 

As a point of interest, alternative data for an explosion, gun shot and 
bonus signal are given below. 


Explosion data: 
120 DATA 0,0,1,0,6,0,5,0 
130 DATA 1,10,0,0,0,0,0,0,42,&F0,0,&FE, 126,94 


Gun shot data: 
120 DATA 0,0,1,0,5,0,4,0 
130 DATA 1,10,0,0,0,0,0,0, 126,—16,0,—16, 126,94 


Bonus signal data: 
120 DATA 1,0,1,0,200,0,50,0 
130 DATA 1,1,0,—20,0,0,10,0,0,0,0,—127,126,0 


Perhaps the most useful to the machine code programmer is OSWRCH so 
it deserves more detailed treatment. Any examples given assume that 
symbolic operands used, such as OSWRCH itself, have been prior- 
assigned in BASIC. Such names are of mnemonic value only. They are not 
recognised by the operating system until equated to a specific machine 
address, in this case, & FFFE. 


Using OSWRCH 


OSWRCH writes the ASCII character code in theaccumulatorto ‘the 
currently selected’ output device. 


The term ‘currently selected’ refers to either the screen, printer or RS423 
interface. The default condition is to screen and printer only. Other 
combinations can be achieved by a prior call to OSBYTE (see later). 

The way OSWRCH works is as follows: 


(a) Calling address & FFEE (indirected via &020E). 
(b) The A,X and Y registers have their contents preserved. 
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(c) The C,N,V and Z flags are undefined. 


Example 1: LDA #72 

JSR OSWRCH \Prints“H”on the screen 
Example 2: LDA ASC(“H”) 

JSR OSWRCH 


Relating OSWRCH to VDU codes 


However, OSWRCH is capable of much more than is suggested above. 
This is due to the cunning use of the 32 control codes which extend through 
the ASCII range 0 to 31. This code band was left vague when ASCII was 
launched way back in primeval times. It was felt that a degree of latitude 
was desirable at the bottom end to allow for individual hardware design. 
All the graphics facilities available in BASIC can be obtained in machine 
code by means of OSWRCH. Page 378 of the User Guide lists the VDU 
code summary. VDU statements can, amongst other things, control screen 
colour, define graphics windows and various x,y plotting operations. All 
these can be achieved in machine code by the use of OSWRCH. Columns | 
and 2 on Page 378 of the User Guide are the decimal and hex ASCII control 
codes. Column 3 relates to the CTRL keys, and column 4 is more or less 
useless. Column 5 —- ‘Bytes extra’ —- is particularly important for our 
purpose. All the codes have to place the appropriate ASCII code in the 
accumulator before using JSR OSWRCH. Some, however, require extra 
trips to OSWRCH, depending on the number of ‘extra bytes’. Codes which 
demand ‘0’ extra bytes are ‘one-trip’ excursions. Examples are: 


LDA #2 \ Equivalent to VDU 2 
JSR OSWRCH \Enable printer 


LDA #16 \Equivalent to VDU 16 
JSR OSWRCH \Clear graphics area 


LDA #12 \ Equivalent to VDU 12 
JSR OSWRCH \Clear text area 


VDU equivalents begin to bea little unruly when there are ‘extra’ bytes. 
Each extra byte involves the setting of another number in the accumulator 
and, of course, another trip to OSWRCH. Examples are: 
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LDA #17 \VDU 17,2 

JSR OSWRCH \Define text colour 2 
LDA #2 

JSR OSWRCH 


LDA #22 \VDU 22,5 

JSR OSWRCH \Set Mode 53S 
LDA #5 

JSR OSWRCH 


LDA #25 \VDU 25,0, 10035005 
JSR OSWRCH \PLOT K,x,y 
LDA #0 \K=0 (move relative to 
JSR OSWRCH \last point) 
LDA #100 \x=100 (low byte) 
JSR OSWRCH 

LDA #0 \x=0 (high byte) 
JSR OSWRCH 

LDA #244 \y=244 (low byte) 
JSR OSWRCH 

LDA #1 N\y=256 (high byte) 
JSR OSWRCH 


Users of BBC BASIC will be aware that VDU statements can be chained 
together. For example, 


VDU 22,2 followed by VDU 24,0;0; 1279; 767; 
can be written more economically as 
VDU 22,2,24,0;0;1279;767; 


Although OSWRCH can indeed simulate any VDU statement, there is 
no point in denying that its use, particularly when chaining lengthy 
examples, is tedious. In short, it appears to be a weary and ponderous task. 
Consider for a moment the tedium of typing in scores of LDAs and JSR 
OSWRCHs involved in drawing acomplex graphics screen. The end listing 
could eventually resemble a toilet roll. We need a routine where the 
computer does most of the work for us. There are two main methods of 
performing this task: one is to use a BASIC procedure acting as a macro; 
the other is to create a look-up table of data bytes that can be accessed by 
indexed addressing. The chain of VDU parameters could then be parcelled 
up neatly within DATA statements and then assembled into equivalent 
source code. 


The macro approach 


First, the macro method will be described. The routine is given in 
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Program 7.4 and is universal whichever BASIC ROM happens to be 
installed. However, a simpler version for those with a BASIC II ROM is 
given in Program 7.5. This modification is due to the introduction of the 
EQUS pseudo op-code. 

To be useful, the macro should have the following qualities: 


(1) It should utilise decimal or hex data. 

(2) It should handle single- or double-byte data automatically. 

(3) It should assemble single-byte labelled locations. 

(4) It should handle positive and negative data elements whether single- or 
double-byte in length. 


Referring to Program 7.4, an example list of VDU chains is put into 
DATA statements in lines 470 and 480. The data sets the text and graphics 
windows and constructs a yellow square on a blue backcloth in MODE 2. 
Where a two-byte VDU entity is required by the operating system (that is, a 
number followed by a ‘y rather than a ‘,’) the DATA element must be 
followed by a ‘@” so that the routine can differentiate between the two. 
Notice that negative decimal two-byte data can be used which is especially 
useful in relative plotting. Armed with this routine, a graphics screen can be 
planned out in BASIC and quickly changed to an assembly language 
version. The example also shows how labelled locations can be 
incorporated. The macro can be used once or many times during an 
assembly program by coming out into BASIC and typing PROCvdu(N) 
where N is the sequential number of DATA elements you require to 
incorporate at that particular time. It is essential, however, to restore the 
DATA pointer at the start of each pass of the assembler (line 290). 


>LIST 
10 REM CONDITIONAL ASSEMBLY PROGRAM 
20 REM FOR CHAINING VDU PARAMETERS 
30 GOTO240 
40 
20 DEFPROCvdu (N) 
60 LOCAL D,Dt,B,byte,item,lbyte 
7O FOR item=1 TON 
80 READ D# 
90 IF RIGHT$(D$,1)="@" THEN B=2 ELSE 
B=1 


100 D=EVAL (Ds) | 
110 IF ASC(D$) >64 THEN COPT PASS:LDA b ) 
:JSR OSWRCH: J:GOTO150 
120 IF D<O THEN D=(ABS(D) EOR &FFFF) +i 
130 byte=D MOD 256:PROCform 
140 IF B=2 THEN byte=D DIV 256:PROCfor 
m 
150 NEXT item 
160 ENDPROC 


Program 7.4. Macro assembly of VDU parameters. 
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170 

180 DEFPROCform 

i190 IF byte<>lbyte THEN COPT PASS:LDA 
#byte: J 

200 COPT PASS: JSR OSWRCH: J 

210 lbyte=byte 

220 ENDPROC 

230 

240 OSWRCH=SFFEE 

250 BCOL=&70: SQCOL=%71 

260 DIM START 256 

270 FOR PASS=0 TO 3 STEP 3 

280 PZ=START 

290 RESTORE 

300 

310 COPT PASS 

320 LDA #3 \SET SQUARE COLOUR 

330 STA S@QCOL 

340 LDA #%84 \SET BACKGROUND COLOUR 

3u0 STA BCOL 

360 J 

370 PROCvdu (39) 

380 COPT PASS 

390 \ ANY SOURCE CODE 

400 RTS: J 

410 

420 NEXT PASS 

430 CALL START 

440 

450 REM DATA IS A LIST OF VDU CHAINS 

460 REM YELLOW SQUARE/BLUE BACKGROUND 

470 DATA 22,2, 28,0,5,19,1, 24, 08, 08, 127 
9@, 7672,18,0, BCOL,16 

480 DATA 18,0, SQCOL, 25, 4, 5006, 5006, 25, 
1, 200@, 0@, 25, 81, 08, —2008, 25, 1, —2008, 04, 2 
5,81, 06, 2008 


Program 7.4. contd 


Program 7.4 takes advantage of the concept of conditional assembly. In 
this case, a conditional test is made to see if the next byte of data is the same 
as the byte preceding it. If this is so then a LDA #byte will not be required 
(since the data byte will already be in the accumulator). 

The operation of the program is as follows: 

The routine reads in each DATA element and tests for the ‘@’ character 
termination, setting the variable B to the number of bytes as appropriate. 
Line 100 uses EVAL rather than VAL to evaluate the data string (D$) and 
places the result into the variable D. This line is necessary when the 
following situations occur in D$. 


(1) A ‘&’ precedes a number, indicating hexadecimal. 
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(2) A labelled location is encountered, in which case the assigned location 
address is put into the variable D, assuming, of course, it has been 
previously defined. 

(3) The temporary ‘@” character is ignored (it has outgrown its usefulness). 


Line 110 tests whether D$ contains numeric or alpha data. If the data is 
alpha — that is, a labelled location — then the accumulator is loaded with the 
assigned address since, in this case, we do not want immediate addressing. 

Line 120 checks if D is negative so that a two-byte two’s complement 
form can be generated. Line 130 forms the low-byte and line 140 forms the 
high-byte if required. PROCform handles conditional assembly using 
immediate addressing. 

With the upgraded BASIC II ROM installed, the above program can 
still be used but a more convenient version is given in Program 7.5. The 
difference is that we need not leave the assembler to use the macro. The 
pseudo op-code EQUS can place a string, where positioned, within a 
program. If we use the function FNvdu(N) which returns a null string (we 
do not want a string returned) then all the goings on of the macro will be 
performed without the complication of leaving the assembler! 

This type of modification can be employed in similar programs where 
procedures are encountered on breaking out of the assembler. For the sake 
of standardisation, the method is not used in further examples since large 
numbers of BBC Micro’s have BASIC I installed. 


>LIST 
10 REM CONDITIONAL ASSEMBLY PROGRAM 


20 REM FOR CHAINING VDU PARAMETERS 
30 GOTO2Z40 


20 DEF FNvdu (N) 

60 LOCAL D,D%,B,byte,item,lbyte 

70 FOR item=1 TON 

B80 READ DS 

90 IF RIGHT$(D$,1)="6" THEN B=2 ELSE 
B=1 

100 D=EVAL (DS) 

110 IF ASC(DS) >64 THEN COPT PASS:LDA D 
:JSR OSWRCH: J]: GOTO150 

120 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 

130 byte=D MOD 256:PROCform 

140 IF B=2 THEN byte=D DIV 256:PROCfor 


150 NEXT item 

160 ="" | 

170 

180 DEFPROCform 

190 IF byte<>lbyte THEN COPT PASS:LDA 
#byte: J 


Program 7.5. Macro assembly of VDU parameters (BASIC II onwards). 
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200 COPT PASS: JSR OSWRCH: J 
210 lbyte=byte 

220 ENDPROC 

230 

240 OSWRCH=SFFEE 

290 BCOL=%70: S@COL=&71 

260 DIM START 256 

2/9 FOR PASS=0 TO 3 STEP 3 
280 PZ=START 

270 RESTORE 


300 

310 COPT PASS 

320 LDA #3 \SET SQUARE COLOUR 

330 STA SQCOL 

340 LDA #&%84 \SET BACKGROUND COLOUR 


350 STA BCOL 

3260 EQUS FNvdu (39) 

370 RTS:] ‘ 

380 

3290 NEXT PASS 

400 CALL START 

410 

420 REM EXAMPLE LIST OF OSWRCH CHAINS 

430 REM YELLOW SQUARE/BLUE BACKGROUND 

440 DATA 22,2, 28,0,3,19,1, 24, 0@, 0@, 127 
9@,767@,18,0,BCOL,16 

450 DATA 18,0, SQCOL, 25, 4, 5008, 5008, 25, 
1, 200@, 0@, 25, 81, 0@, —2006, 25, 1, —200@, 08, 2 
5,81, 08, 2008 


Program 7.5. contd 


The look-up table approach 


Another approach to this problem is by using a look-up table. Program 7.6 
shows the essential differences. For a start, conditional assembly is not 
used as in the previous example and the use of labelled locations has been 
dropped. The DATA elements in the BASIC routine are split up into the 
relevant single bytes and stored as a look-up table from the location 
labelled ‘data’ onwards. The short piece of code at lines 230 to 290 can be 
placed anywhere within the source program to access this table 
sequentially. The term ‘look-up’ is used since any element can be looked up 
by setting the index register to the required offset from the table base 
address. It is conventional to place data tables such as this at the end of a 
program. The previous example has the edge on speed but this method is 
more economical in the use of memory locations. 

Program 7.7 is an example of the type of moving graphics available by 
using the macro approach. The program sets up a graphics screen in mode 2 
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>LIST 

10 REM USING DATA TABLES FOR 

20 REM CHAINING VDU PARAMETERS 

30 GOTO160 

40 

50 DEFPROCdatatable(N) 

60 FOR item=1 TO N 

7QO READ D&S 

S80 IF RIGHT$(D$,1)="@" THEN B=2 ELSE 

B=1 

90 D=EVAL (DS) 

100 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 
110 ?PZ=D MOD 256:P2Z=P2+1 

120 IF B=2 THEN ?P%=D DIV 256:P2Z=P2+1 
130 NEXT item 

140 ENDPROC 

i150 

160 OSWRCH=2FFEE 

170 DIM START S00 

180 FOR PASS=0 TO 3 STEP 3 

190 PZ=START 
200 RESTORE 


210 

220 COPT PASS 

230 LDY #0 \LOOP OUTPUTS ALL 
240 .LOOP \DATA ITEMS STORED 


250 LDA data, Y \FROM data ONWARDS 
260 JSR OSWRCH \BY PROCdatatable 


270 INY \AND CAN BE PLACED 
280 CPY #53 \ANYWHERE WITHIN 
290 BNE LOOP \SOURCE CODE PROG. 
300 BEG FINISH 

310 .data 


320 IPROCdatatable(39) 

330 COPT PASS 

340 .FINISH 

350 RTS:] 

360 

370 NEXT PASS 

380 CALL START 

390 

400 REM DATA IS A LIST OF VDU CHAINS 

410 REM YELLOW SQUARE/BLUE BACKGROUND 

420 DATA 22,2, 28,0, 3,19,1, 24, 0@, 0@, 127 
9@, 767@, 18,0,132,16 

430 DATA 19,0,3, 25, 4, 5008, 500@, 25,1, 20 
0@, 0@, 25, 81, 0@, —2008, 25, 1, -200@, 0@, 25, 81 
, 08, 2008 


Program 7.6. Using data tables for chaining VDU parameters. 


and produces the ubiquitous bouncing ball, by now the standard 
apprenticeship exercise in the use of animated graphics. The macro is used 
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at line 310 to set up the screen and again at line 980 to form the subroutine 
BALL. 

Any Moving OBject (MOB) must have an associated velocity. In simple 
cases this is little more than an increment added to the object’s previous 
position so as to calculate its next position on the screen. An acceleration, 
incidentally, can be mimicked (position-wise) by adding a further steadily 
increasing increment as new positions are calculated. When the object 
reaches a boundary, the velocity (increment) must reverse sign if it is to 
remain on the screen. This is straightforward to program in BASIC. In 
machine code, however, the object’s position on the screen is a two-byte 
number. You could be forgiven for thinking that the increment need only 
be a single-byte number because of its normally small value. However, this 
is not so. Numbers differing in byte length cannot be added if they are of 
mixed sign, therefore we must also have a two-byte increment. 


>LIST 

10 REM BOUNCING BALL EXAMPLE 

20 GOTO 230 

30 

40 DEFPROCvdu (N) 

~O LOCAL D,D$,B, byte,item, lbyte 

60 FOR item=1 TO N 

7O READ Ds 

BO IF RIGHT#$(D$,1)="@6" THEN B=2 ELSE 
B=1 

90 D=EVAL (DS) 

100 IF ASC(D#) >64 THEN COPT PASS:LDA D 
:JSR OSWRCH: 31:GOTO 140 

110 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 

120 byte=D MOD 256:PROCform 

130 IF B=2 THEN byte=D DIV 256:PROCfor 


140 NEXT item 

150 ENDPROC 

160 

170 DEFPROCform 

180 IF byte<>lbyte THEN COPT PASS:LDA 
#byte: J 

190 COPT PASS: JSR OSWRCH: J 

200 lbyte=byte 

210 ENDPROC 

220 

250 OSWRCH=&FFEE: OSBYTE=&FFF4 

240 X=&703 Y=&72: XINC=&74: YINC=&76 

2090 BCOL=%78 

260 DIM START 1000 

270 FOR PASS=0 TO 3 STEP 3 

280 PZ=START 

290 RESTORE 


Program 7.7. the ubiquitous bouncing ball. 
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500 REM SET UP SCREEN 


5310 PROCvdu (16) 


320 DATA 22,2, 28,0,3,19,1, 24, 0@, 0@, 127 
, 767@, 18,0, 132,16 


350 
3540 
3550 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480, 
490 
900 
2710 
320 
330 
240 
IO 
260 
370 
280 
390 
600 
610 
620 
630 
640 
650 
6560 
670 
680 
690 
700 
710 
720 
730 
740 
750 
760 
770 
780 
790 


COPT PASS 


LDA 
STA 
STA 
STA 
STA 
STA 
STA 
LDA 
STA 
STA 


#0 

X 

Y 
XINC+1 
YINC+1 
X+1 
Y+1 

#8 

X INC 
YINC 


. LOOP 


LDA 
JSR 
LDA 
STA 
JSR 
LDA 
CLC 
ADC 
STA 
LDA 
ADE 
STA 
LDA 
CLC 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
STA 
JSR 
LDA 
CMP 
BCC 
LDA 
EOR 
STA 
LDA 
EOR 
STA 
INC 
BNE 


#%135 
OSBYTE 
#4 
BCOL 
BALL 

x 


XINC 

X 

X+1 
XINC+1 
X+1 

Y 


YINC 

Y 

Y+1 
YINC+1 
Y+1 

#7 
BCOL 
BALL 
¥+1 

#3 
YONSCR 
YINC 
#&FF 
YINC 
YINC+1 
#&FF 
YINC+1 
YINC 
YONSCR 


NINITIALISE X,Y,XINC 
\NAND YINC 


\WAIT FOR FIELD SYNC 


\PLACE BALL ON SCREEN 
\IN FOREGROUND COLOUR 


\NADD XINC TO X 
\(2 BYTES) 


\ADD YINC TO Y 
\(2 BYTES) 


\DELETE BALL WITH 
\BACKGROUND COLOUR 


\CHECK Y ON SCREEN 


\FIND 2’s COMPLIMENT 
\OF YINC (2 BYTES) 


Program 7.7. contd 
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B00 INC YINC+1 
810 .YONSCR 


820 
830 
840 
850 
860 
870 
880 
890 
900 
710 
920 
930 


LDA 
CMP 
BCC 
LDA 
EOR 
STA 
LDA 
EOR 
STA 
INC 
BNE 
INC 


X+1 

#5 
XONSCR 
X INC 
#2FF 
XINC 
XINC+1 
#27 F 
XINC+1 
X INC 
XONSCR 
XINC+1 


\CHECK X ON SCREEN 


\FIND 2’s COMPLIMENT 
\OF XINC (2 BYTES) 


171 


940 .XONSCR 

950 JMP LOOP 

960 

970 .BALL 

980 IPROCvdu (13) 

990 DATA 18,0, BCOL, 25,4,X,X+1,Y,Y+1,25 


5 1,08, —-8e 
1000 CRTS:] 
1010 


1020 NEXT PASS 
1030 CALL START 
1040 END 


Program 7.7.contd 


The program is liberally remarked but one area worthy of further 
comment is where the on-screen position checks are made. Referring to the 
case of the Y screen boundary checks in lines 690 to 710, the CMP 
instruction sets the carry flag if M<=A and clears it if M>A. In this case, 
the CMP #3 instruction in line 700 will set the carry flag if the accumulator 
contents Y+1 >= 3. This will occur when the accumulator contents are 
between 3 and 255 inclusive (&3 and & FF). Consequently, the carry flag 
will be clear when the high byte of the screen position Y+1 takes a value of 
0,1, or 2 which are legitimate screen positions. The end result of the 
conditional branch in line 710 is that XINC (2 bytes) is only reversed in sign 
(two’s complement) when the carry flag is set. Therefore, only one 
comparison is required to test for both the top and bottom screen 
boundaries. The screen boundary tests in the other dimension are 
conducted in a similar fashion. 


Summary 


1. High levels of subroutines nesting can overflow the stack. 
2. Each unreturned JSR uses two stack locations. 
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10. 


11. 


12. 


ES. 


. Each JSR uses 6 clock cycles and RTS another 6, so a JSR within a 


loop can squander time. 


. Resident subroutines, within the ROM operating system, are plentiful 


and easy to use in machine code programs. 


. Where the screen display is involved, there are speed advantages to be 


gained by using direct screen addressing. However, the program may 
not work through the Tube with the second processor. If you never buy 
one, this won’t matter anyway. 


. A vector is a two-byte word in memory which is the address of a 


routine. 


. Some resident subroutines have vectored addresses. Changing the 


contents of the vector allows interception to a different routine. 


. Most vectored addresses are in page 2. 
. Graphic facilities are handled by OSWRCH at address &FFEE, 


indirected via &020E. 

The sound generators and envelope shaping are handled by 
OSWORD. 

Input data is handled by OSRDCH at address & FFEO, indirected via 
&0210. 

Although the assembler does not offer macro facilities, they can be 
simulated by temporary transfer from assembly code to a BASIC 
procedure. 

If the new BASIC II ROM is installed, the EQUS structure can be used 
within the assembler for simulating macros and other functions. 


Self test 


7.1 


72 


ws 
7.4 


LE) 


Use the macro procedure to draw a large red square on a yellow 
background in MODE 2. 

Use the look-up table method to draw a yellow triangle on a blue 
background. 

Program the sound generator to play a scale in F# major. 

Add a keyboard controllable bat routine to intercept the ball in 
Program 7.7. 

Employ the user-definable character method for plotting the ball in 
Program 7.7. 


Chapter Eight 


Direct Screen Addressing 
and Hardware Scrolling 








Direct screen addressing overview 


Addressing screen locations directly has some advantages over using 
operating system subroutines such as OSWRCH. Faster and smoother 
moving graphics can be devised. The disadvantage of directly addressing 
screen locations (incompatibility with the Tube) has been emphasised 
before. When using direct screen addressing, the extra work involved in 
programming is considerable. The screen can no longer be considered as a 
convenient rectangle of plot points with X, Y coordinates. Instead, the 
screen is considered as a block of memory locations, laid out in a form 
similar to the screen layout map for MODE 2 shown in Fig. 8.1. In practice, 
the most commonly used mode for this method of programming is MODE 
2 since the full sixteen colours are available to the user with a reasonably 
high resolution of 160*256 pixels. It is not practicable in a book of this size 
to discuss all MODES so direct screen addressing in this chapter will refer 
entirely to a MODE 2 screen. 

The programs in this chapter should be helpful when writing games or 
educational material, particularly those. which involve animation. They 
show how faster and smoother graphic displays appear when use is made of 
machine code routines called from BASIC. 

In MODE 2, each screen memory location which is written to will light 
up two pixels. If we are happy with a minimum movement of two pixels ata 
time in an animated sequence then the whole process simplifies to shifting 
bytes around in the screen memory area shown in Fig. 8.1. Any of the 
sixteen colours can be selected for each pixel by setting appropriate bits 
within the byte written to screen memory. Figure 8.2 shows how this is done 
and the relationship to pixels lit on the screen. 

Each pixel is represented by a nibble (4 bits) in a somewhat staggered 
format as shown. All that is needed is to set each nibble to the required 
colour code. These are the standard colour numbers as used in BASIC (0 to 
15). If only one pixel is required to be lit then the other can be set to the 
relevant background colour, usually all zeros (default black). To reinforce 
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Fig. 8.1. The MODE 2 screen map. 
Left hand pixel bits OY) 








Right hand pixel bits 
| Red Yellow 
ofofofojolififa 
Bit setting example Pixel pair lit on screen 


Fig. 8.2. How a MODE 2 screen byte is set up. 
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the point, try the following simple routine and experiment with the value 
loaded into the accumulator in line 40: 


10 MODE 2 

20 PZ=%0D00 

30 C 

40 LDA #7 \Load byte | 

SO STA &7FFO \Light up pixel pair 
60 RTS: ] 

79 CLS 

80 CALL &ODOO 


The routine lights up a red pixel to the left of a yellow pixel at the bottom 
right-hand corner of the screen. The byte loaded in line 40 consists of the 
code for red (1 binary) and the code for yellow (11 binary). These 
amalgamated insthe form shown in Fig. 8.2 produce a byte 0000 O111. 

As an example of how fast direct screen addressing can be, a screen fill 
routine is given in Program 8.1. This program sets the whole screen to red 
by writing a byte of 0000 0011 to each screen location. Blocks of 128 bytes 
are sent to the screen memory by indirect indexed addressing. The reason 
for this will become apparent later when we start to whip UFOs and things 


around the screen. 
>LIST 
10 REM FILLING A COLOUR SCREEN ; 
20 REM DIRECT SCREEN LOCATIONS MODE 2 
30 OSWRCH=&FFEE 
40 LOC=&70 
SO DIM START 256 
60 FOR PASS=0 TO 2 STEP 2 
70 P%Z=START 
80 COPT PASS 
90 LDA #&16 \SET UP MODE2 
100 JSR OSWRCH 
110 LDA #2 
120 JSR OSWRCH 
130 LDA #&00° \SET LOC TO SCREEN 
140 STA LOC \START ADDRESS 
i50 LDA #&30 \(2 BYTES) 
160 STA LOC+i1 
170 .BEGIN 
i180 LDA #23 
190 LDY #&7F 
200 .LOOP 
210 STA (LOC),Y 
220 DEY 
230 BPL LOOP 
240 CLC 
250 LDA LOC \ADD 128 TO LOC 


Program 8.7. Filling a colour screen. 
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260 ADC #&80 
270 STA LOC 

280 BCC SKIP 
270 INC LOC+1 


300 .SKIP 

310 CMP #&00 \COMPARE LOC TO END 
320 BNE BEGIN \ADDRESS OF SCREEN 
330 LDA LOC+1 \IF NOT THE SAME 
240 CMP #&680 \BRANCH TO BEGIN 


300 BNE BEGIN 
260 RTS: J 

370 NEXT PASS 
380 CALL START 


Colour animation by addressing direct screen locations 


In this section, a simple animated sequence will be described which moves a 
32 pixel red rectangle from the top to the bottom of the screen smoothly 
and without any perceptible flicker. The routine is artificially slowed down 
by waiting for the field synchronisation pulse. If this is not done the 
rectangle flies across the screen so fast that the T V system cannot reproduce 
it. This is not a problem ina large practical game, since many other objects 
will be moved in between. However, timing plays an important part in 
animation if pleasing results are to be obtained. 

The labelled address LOC is set to the screen start address in MODE 2, 
which is &3000 taking 2 bytes of zero page memory. After waiting for the 
frame synchronisation pulse, the red rectangle is placed on the screen with 
indirect indexed addressing, scooping up the arbitrary 16 bytes necessary 
to form the image. Since in this case we want a red rectangle, the 
accumulator is set to &03. The image is then deleted ina similar manner but 
this time the accumulator contains the code for black (&00). The value 8 is 
then added to the locations LOC to calculate the next image position. The 
reason for this is to move the image by two pixels to the right. Reference to 
the MODE 2 screen map in Fig. 8.1 will help here. After checking that the 
calculated position is still on the screen, the cycle is restarted. Program 8.2 
shows how single colour objects can be moved over the screen. 


>LIST 
10 REM COLOUR ANIMATION BY ADDRESSING 


20 REM DIRECT SCREEN LOCATIONS MODE 2 
30 OSWRCH=%FFEE: OSBYTE=&FFF4 

40 LOC=&70: COL=%72 

30 DIM START S500 

60 FOR PASS=0 TO 2 STEP 2 

7O PZ=START 


Program 8.2. Single colour MOBs. 


80 

90 
100 
110 
120 
130 
140 
150 
160 
170 
180 
i90 
200 
210 
220 
230 
240 
250 
260 
270 
280 
290 
300 
310 
320 
330 
340 
350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
500 
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COPT PASS 
LDA #&16 
JSR OSWRCH 
LDA #2 

JSR OSWRCH 
LDA #&00 
STA LOC 
LDA #&30 
STA LOC+i1 
- BEGIN 

LDA #3 

STA COL 
JSR OBJECT 
LDA #&13 
JSR OSBYTE 
LDA #0 

STA COL 
JSR OBJECT 
CLE. 

LDA LOC 
ADC #&8 
STA LOC 
BCC SKIP 
INC LOC+1 
. SKIP 

CMP #&00 
BNE BEGIN 
LDA LOC+I1 
CMP #&80 
BNE BEGIN 
BE@Q FINISH 
- OBJECT 
LDY #&0OF 
LDA COL 

. L_OOP 

STA (LOC),Y 
DEY 

BPL LOOP 
RTS 
-FINISH 
RTS: J 

NEXT PASS 
CALL START 


Multi-colour MOBs 


The use of user-defined characters has one drawback — only single colour 
characters can be defined. The standard way to get over this is overprinting 


\SET UP MODE2 


\SET LOC TO SCREEN 
\START ADDRESS 
\(2 BYTES) 


\SET BOTH PIXEL 
\COLOURS TO RED. 


\WAIT FOR FIELD SYNC 


\SET BOTH PIXEL 
\COLOURS TO BLACK 
\ CBACKGROUND COLOUR) 


\ADD 8 TO LOC 


\COMPARE LOC TO END 
\ADDRESS OF SCREEN 
\IF NOT THE SAME 
\BRANCH TO BEGIN 


\PLACE OBJECT ON 
\SCREEN (16 BYTES) 
\ (32 PIXELS) 


Program 8.2. contd 
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with another character of chosen colour, thus building up the required 
multicolour object. This can be a time-consuming process even in 
assembly language. The problem does not arise with direct screen 
addressing. Program 8.3 shows the essential details. 


>LIST 
10 REM MULTICOLOUR ANIMATION BY 


20 REM ADDRESSING DIRECT SCREEN 
350 REM LOCATIONS IN MODE2 

40 GOTO140 

rar @ 

60 DEFPROCdatatable(N) 

7Q FOR item=1 TO N 

80 READ DS 

90 D=EVAL (D$) 

100 ?PZ=D:PZ4Z=P24+1 

110 NEXT item 

120 ENDPROC 

130 

140 OSWRCH=RFFEE: OSBYTE=&FFF4 
150 LOC=&70 

160 DIM START S00 

170 FOR PASS=0 TO 2 STEP 2 
180 PZ=START 

190 RESTORE 
200 COPT PASS 


210 LDA #&16 \SET UP MODE2 

220 JSR OSWRCH 

250 LDA #2 

240 JSR OSWRCH 

230 LDA #&00 \SET LOC TO SCREEN 
260 STA LOC \START ADDRESS 

270 LDA #&30 \(2 BYTES) 

280 STA LOC+i 

290 

300 .BEGIN 

310 LDY #&1F \PLACE OBJECT ON 
320 .~LOOP2 \SCREEN DEFINED 
250 LDA data, Y \BY data 

240 STA (LOC),Y 

3su0 DEY 

260 BPL LOOP2 

370 LDA #&13 \WAIT FOR FIELD SYNC 
3580 JSR OSBYTE 

=90 LDA #0 \DELETE OBJECT ON 
400 LDY #&1F \SCREEN BY BLANKING 
410 .LOOP1 \IN BACKGROUND 

420 STA (LOC),Y \COLOUR (BLACK) 
430 DEY 

440 BPL LOOPi 

450 CLC 


Program 8.3. A UFO as an example of a multi-colour MOB. 
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460 LDA LOC \ADD 8 TO LOC 
470 ADC #&8 

480 STA LOC 

490 BCC SKIP 

700 INC LOC+I1 


910 .SKIP 

320 LDA LOC \COMPARE LOC TO END 
3OO BNE BEGIN \ADDRESS OF SCREEN 
3740 LDA -LOC+1 \IF NOT THE SAME 
3020 CMP #&80 \BRANCH FO BEGIN 


360 BNE BEGIN 
270 BEQ FINISH 


380 

2370 .data 

600 JPROCdatatable(32) 

610 DATA 0,0,0,1,3,3,3,0,%40,1,3,3,9,3 
g 5 25 X80, 2, Sy 35 65 55 55 1,90,0,90,2,5,5,5,0 


620 

630 COPT PASS 
640 .FINISH 
650 RTS: J 

660 NEXT PASS 
670 CALL START 


Program 8.3. contd 


The program differs from the previous one in that a look-up table is used 
to form the object’s colours. The data consists of 32 sequential bytes (64 
pixels) which are ‘looked up’ in a method similar to that described in 
Chapter 7. Notice that this program is less complex, since only single byte 
data is handled by the BASIC procedure. Do not forget the even simpler 
method with BASIC II (or later versions) involving EQUS. 

The object itself can be planned out on grid paper with up to 128 
consecutive bytes, since BPL is used in the loop loading sequence. 


Address coordinates 


The method described previously is, by virtue of speed, preferred for small 
MOBs travelling in a horizontal direction. However, the movement in the 
vertical direction leaves a lot to be desired. Movement by 8 pixels at atime is 
not very satisfactory! Furthermore, the height of the MOB is restricted to 8 
pixels. The standard method of overcoming these problems is to employ a 
routine which translates address coordinates (80*256) to absolute screen 
addresses so that discontinuities in the memory map in the X and Y 
directions are avoided. The MODE 2 screen map consists of a matrix of 80 
bytes (2 pixels a byte) in the X direction and 256 bytes in the Y direction 
making 80*256=20K bytes in all. If we are satisfied with two-pixel 
movement in the horizontal direction as before, the concept of address 
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coordinates can be envisaged. The screen memory map given earlier is set 
out in blocks of eight sequential addresses. The difference between 
equivalent positions in any adjacent block of 8 in the Y direction is &280 or 
640 decimal. Similarly in the X direction the difference is 8. Therefore the 
equation to calculate a unique screen address from an XY address 
coordinate is given by: 


Screen address = &3000 + 8X + 640(Y DIV 8) + (Y MOD 8) 


where, &3000 is the screen start address; 8X is the X contribution because 
there are 8 address locations difference between adjacent X coordinates; 
640 (Y DIV 8) specifies the block of eight containing the Y coordinate; and 
Y MOD 8 specifies the position in that block of 8. 

The above equation is not suited to machine coding in its present form. 
We need to rearrange the equation so that all multipliers are, as far as 
possible, in exact powers of 2. Any multiplication or division then simplifies 
to shifting bits left or right respectively. This can be conveniently achieved 
as follows: 


& 3000 + 8X + 640(Y DIV 8) + (Y MOD 8)= &3000+ 8X + 80(8 (Y 
DIV 8)) + (Y MOD 8) 


Let YI=8(Y DIV 8) 
= &3000 + 8X + 80Y!1 + (Y MOD 8) 
= &3000 + 8X + 16Y1 + 64Y1 + (Y MOD 8) 


The coding of the above is now relatively easy. All (Y DIV 8) means is 
shift Y right 3 times, thus losing the 3 least significant bits. Multiplying by 
8, giving 8(Y DIV 8), is then achieved by shifting the result left 3 times. The 
net result of all this is just to lose the 3 least significant bits. They have fallen 
off the end. 

A simpler way of arriving at the same result is to mask out the 3 least 
significant bits of Y with AND#&FE. Similarly, all (Y MOD8) means is to 
recover the bits we lost inthe previous operation by masking out the 5 most 
significant bits of Y with AND #&7. The expression 8X is achieved by 
shifting X left 3 times. Two bytes will be required to accommodate the 
result on the last two shifts. 

The 64Y1 expression can be arrived at by shifting Y1 left six times. 
However, if you can imagine a two-byte result, shifting Y1 right twice and 
storing it as the high-byte of the result will be the exact equivalent. The low- 
byte of the result can be set to all zeros. 

Two further shifts right of the 64Y1 result will divide by 4, giving 16Y1. 
However, we must rotate the carry into the low-byte of the result on the last 
shift right. The LSR and ROR instructions, respectively, are needed. 

Adding the whole lot together gives the required screen address. 
Program 8.4 (lines 1080 to 1380) shows one way of coding the above. 
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2>LIST 
10 
20 
30 
40 
30 
60 
7O 
80 
90 
100 
110 
120 
130 
140 
150 


REM MOVING MULTICOLOURED OBJECTS 
REM AN IMPROVED METHOD FOR MODE 2 
REM USING 80#256 BYTE COORDINATES 
GOTO140 


DEF FNdatatable(N) 
FOR item=1 TON 
READ D& 

=EVAL ("&"+D$) 
?PAZ=D:PAZ=PA+1 
NEXT item 
=PASS 


OSWRCH=&FFEE: OSBYTE=&FFF4 
XCOORD=&70: YCOORD=&7 1 : wi dth=&72:he 


ight=&73: woount=&74: LOC=&75 


160 
170 
180 
190 
200 
210 
220 
250 
240 
200 
260 
279 
280 
290 
300 
310 
320 
330 
340 
350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 


STORE=&77: data=&79: Yreg=&/7B 
tableil=&7C:tabl e2=&7E 
XCOORD1=&80: YCOORD1=&81 
XCOORD2=%82: YCOORD2=283 

DIM START 1000 

FOR PASS=0 TO 2 STEP 2 
PZ=START 

RESTORE 

COPT PASS 


LDA #&16 \SET UP MODE2 

JSR OSWRCH 

LDA #2 

JSR OSWRCH 

LDA #(shapel MOD 256) \STORE SHAPE 
STA tablel \TABLE 

LDA #(shapel DIV 256) \ADDRESSES 
STA tableiti 

LDA #(shape2 MOD 256) 

STA table2 

LDA #(shape2 DIV 256) 

STA table2+i 

LDA #0 

STA XCOORDI NINITIALISE BYTE 


STA YCOORD1 \COORDINATES OF 
LDA #34 \SHAPES 

STA XCOORD2 

yeaa), + ZI 
STA~-¥YECBURD2 

. LOOP 

INC XCOORD1I \UPDATE NEW SHAPE 


INC YCOORD1 \COORDINATES 
DEC YCOORD2 


Program 8.4. Moving more than one shape. 
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500 
510 
520 
530 
540 
550 
560 
570 
580 
590 
600 
610 
620 
430 
640 
‘650 
660 
670 
680 
690 
700 
710 
720 
730 
740 
750 
760 
770 
780 
790 
B00 
810 
820 
B30 
B40 
B50 
860 
870 
880 
890 
900 
910 
920 
930 
940 
950 
960 
970 
980 
990 
1000 





JSR SCREEN 

i DA YOCGORD2 

cMP (#60 + © 

BNE CUOP 

BEQ START 

- SCREEN 

LDA tablel \STORE ADDRESS OF 
STA data \FIRST SHAPE TABLE 
LDA tableit+ti \IN data 

STA datati 

LDA #&13 \WAIT FOR FIELD SYNC 
JSR OSBYTE 

LDX XCOORDI \CALL draw SUBROUT’E 
LDY YCOORD1 \WITH PARAMETERS IN 
JSR draw \X AND Y REG’S 

LDA table2 

STA data \STORE ADDRESS OF 
LDA table2+i \SECOND SHAPE TABLE 
STA datatl \IN data 

LDX XCOORD2 

LDY YCOORD2 \CALL draw SUBROUT’E 
JSR draw \AS ABOVE 

RTS 

- draw 

STX XCOORD \STORE PARAMETERS 
STY YCOORD \PASSED 

LDY #0 

LDA (data),Y \STORE SHAPE’S 

STA height \HEIGHT AND WIDTH 
INY \PARAMETERS OBTAINED 
LDA (data),Y \FROM DATA TABLE 
STA width 

LDX #2 \X SAVES Y (TEMP) 
-newr ow 

LDA #0 \CLEAR Yreg STORE 
STA Yreg 

LDA width \RELOAD wcount 

STA weount 

JSR CALCADDRESS \CALC SCREEN ADDr 
-newcolumn 

TXA \TRANSFER X TO Y 
TAY 

LDA (data),Y \PLACE ROWS OF DATA 
LDY Yreg \INTO SCREEN 

STA (LOC),Y \MEMORY UNTIL 

TYA \SHAPE IS COMPLETED 
ADC #8 

STA Yreg \¥reg SAVES Y (TEMP) 
INX 


Program 8.4. contd 


re 


1130 ASL 
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19010 DEC wcount 
1020 BNE newcolumn 
1030 INC YCOORD 
1040 DEC height 
1050 BNE newrow 
1060 RTS 

1070 

1080 .CALCADDRESS 
1090 LDA #0 

1100 STA STORE+1 
1110 STA LOC 

1120 LDA XCOORD 


\CLEAR LOCATIONS 


\CALC. 8#xX 
1140 ASL 

1150 ROL creme 
1160 ASL A 

1170 ROL STORE+1 
1180 STA STORE 
1190 LDA YCOORD 
1200 AND #2Fe@ 


he 


\CALC Y1=8*(YDIV®8) 


1210 LSR A \CALC 64*Y1 
1220 LSR A 

1230 STA LOC+I1 

1240 LSR A \CALC 16*Y1 
i250 LSR A 


\also clears carry 
\CALC 80*Y1 


i260 ROR LOC 
i270 ADC LOC+1 
1280 TAY 

i290 LDA YCOORD 


i300 AND #7 \CALC Y MOD 8 
1310 ADC LOC 
1320 ADC STORE \CALCULATE 


\CUMMULATIVE RESULT 
\FOR SCREEN ADDRESS 


i330 STA LOC 
1340 TYA 

i350 ADC STORE+1 
1260 ADC #&30 
1370 STA LOC+I1 
1380 RTS 

1390 

1400 .shapel 
1410 OPT FNdatatable(s86) 

1420 .shape2 

1430 OPT FNdatatable(62) 

1440 RTS: ] 

1450 NEXT PASS 

1460 CALL START 

1470 

1480 REM THIS IS THE shapel DATA 
1490 DATA E,& 

i500 DATA 0,0,0,0,0,0 

1510 DATA 0,90,0,0,90,90 


Program 8.4. contd 


Du Stor ett 
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1520 DATA 0,0,4F,8F,0,0 
1530 DATA 0,0,1,2,0,0 
1540 DATA 0,0,1,2,0,0 
i550 DATA 0,0,3,3,0,0 
15460 DATA 0,1,3,3,2,0 
1570 DATA 0, 3,946,340 
i580 DATA 0,3,3,3,3;0 
1590 DATA 0,3,3,3,3;0 
1600 DATA 0,3,3.3,35;0 
1610 DATA 0,0,2,1,0,0 
1620 DATA 0,0,0,0,0,0 
1630 DATA 0,0,0,0,0,0 
1640 

1650 REM THIS IS THE shape? DATA 
1460 DATA A, 

1670 DATA 0,0,0,0,0,0 
14680 DATA 0,0,0,0,0,0 
1690 DATA 0,0,1,2,0,0 
1700 DATA 0,0,3,3,0,0 
1710 DATA 0,1,3,3,2,0 
1720 DATA 0,3,9,6,3,0 
1730 DATA 0,3,3.3;35;0 
i740 DATA 0,0,2,1,0,0 
1750 DATA 0,0,0,0,0,0 
i760 DATA 0,0,0,0,0,0 


Program 8.4. contd 


Moving more than one shape 


The above program independently moves two shapes on the screen by 
accessing two separate data tables with a common draw routine. The 
shapes are built up, a data byte at a time, by the subroutine ‘draw’. This 
subroutine constructs a shape in a row by row fashion until completely 
drawn. The shape’s height and width in bytes must be first read in from the 
data table. Line 1490 contains this data. The subroutine places the 
requested shape on the screen by invoking the CALCADDRESS routine 
prior to placing each data byte of a new row into screen memory. There 1s 
no need to call CALCADDRESS for adjacent bytes in a row since simply 
adding 8 to the previous address gives the address of the next byte in the 
row. Two loops are needed for this and the routine is given in lines 750 to 
1060. The relevant data tables are at the foot of the program but note yet 
another way of incorporating data into a machine code program. The OPT 
statement has FNdatable(86) following it. Providing the datatable 
function returns PASS, all BASIC lines in the function are executed 
without ‘officially’ leaving the assembler. 

A border of background colour, one byte wide in the horizontal and two 
bytes in the vertical direction, is useful. This ensures that all the bits and 
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pieces remaining from the last drawn position are automatically erased 
which ever direction the shape is moving. On initialisation, the addresses of 
the supplied shape tables must be stored so that the subroutine SCREEN 
can specify which shape to draw. The X and Y address coordinates are then 
passed over to the draw subroutine in the X and Y registers respectively. 
Lines 560 to 730 are responsible for this. Lines 460 to 540 complete the 
program by incrementing the various coordinates of shapel and shape2 to 
update the next drawing positions. The overall structure of the program 1s 
such that more shapes can easily be added. 

A study of Program 8.4 (and its remarks) will reveal a few of the essential 
techniques for producing action video games. 


Hardware scrolling by programming the 6845 CRTC 


Hardware scrolling has many uses but ones that immediately spring to 
mind are applications such as word processing and producing moving 
landscapes in games programs. 

The 6845 CRTC controller has eighteen internal registers of which only 
two are of interest to us in this chapter. These are registers 12 and 13 which 
together, high-byte and low-byte respectively, are known as the Displayed 
Screen Start Address Register. By re-specifying the displayed screen start 
address, we can scroll in any direction (wrap around) as long as the register 
contains a ‘legal’ screen memory location for the particular MODE used. 
The software must ensure that this does not happen or some unpredictable 
results will occur! In MODES 0, 1, 2, 3 the 6845 CRTC controller generates 
80 characters a line where as in MODES 4, 5, 6 there are 40. These are not 
characters as seen on the screen, however. In MODE 2, for instance, a 
CRITIC character is only a quarter of a displayed character. The CRTC, 
then, sees a MODE 2 screen as 80*32=&0A00 characters. It is important to 
remember that the screen address, sent to the displayed screen start 
register, must be the actual screen memory location divided by 8. This 
arises from the CRTC dealing with characters consisting of 256/32=8 
output scan lines in MODE 2. From now on we will be concentrating 
exclusively on MODE 2 graphics screens for the reasons outlined at the 
beginning of the chapter. There is an added advantage in that each CRTC 
character sideways scroll represents only two pixels’ displacement, 
producing a very smooth movement. 


Sideways scrolling 


To scroll the screen one CRTC character to the left, the screen start address 
register must be incremented. Decrementing the register, on the other 
hand, will scroll the screen one character to the right. There are two 
ways of achieving this in assembly language. The first is to use the 
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assembler equivalent of the BASIC VDU 23 command; the other is to 
address directly the memory-mapped area SHEILA (256 bytes starting 
from & FE00). Incidentally, all the BBC Micro’s internal hardware registers 
are accessible at these locations. 

First, the operating system subroutine method will be described. The 
necessary VDU 23 commands to enable sideways scrolling are as follows, 
f SCR is the screen address low-byte and SCR+1 is the high-byte: 


VDU 23312,SCR303050 
VDU 23313, SCR+1505050 


Only the first of the two lines above is necessary if the screen Is to be 
sideways scrolled left by less than 256 characters since the actual screen 
start address in MODE 2 is &3000. The equivalent address, as required by 
displayed screen start address register, is &3000/8=&0600. Thus, SCR+T | 
will be constant at &06 for 256 bytes or scrolls. The VDU 23 commands can 
be conveniently coded in assembly language by our macro developed in 
Chapter 7 along witha simple test graphics screen. Program 8.5 shows this 
method, the object being to create a continuous sideways scroll to the left. 


10 REM SIDEWAYS SCROLLING USING THE 
20 REM OPERATING SYSTEM SUBROUTINES 
30 GOTO2Z40 


50 DEFPROCvdu (N) 

60 LOCAL D,D$,B, byte, item, lbyte 

70 FOR item=1 TO N 

80 READ D 

90 IF RIGHTS (D$, 1)="e" THEN B=2 ELSE 
B=1 

100 D=EVAL (D#) 

110 IF ASC(DS) >64 THEN COPT PASS:LDA D 
-JSR OSWRCH: J:GOTO150 

120 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 

130 byte=D MOD 256: PROCform 

140 IF B=2 THEN byte=D DIV 256: PROCfor 


150 NEXT item 

160 ENDPROC 

170 

180 DEFPROCform 

190 IF byte<>lbyte THEN COPT PASS:LDA 
#byte: J | 

200 COPT PASS: JSR OSWRCH: J 

210 lbyte=byte 

220 ENDPROC 

2o0 


Program 8.5. Continuous sideways scrolling using operating system 
subroutines. 


240 
250 
260 
270 
280 
290 
300 
310 
3520 


330 
540 
350 
360 
370 
380 
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OSWRCH=&FFEE: OSBYTE=&FFF4 


SCR=&70 


DIM START 1000 
FOR PASS=0 TO 3 STEP 3 


PZ=START 
RESTORE 
PROCvdu (29) 


REM SET UP TEST GRAPHICS SCREEN 
DATA 22, 2,18,0, 132, 16,18,0, 3,25, 4, 
5006, 5006, 25, 1, 2006, 0@, 25, 81, O@, —200@, 25 
, 1, -2008, 0@, 25, 81,08, 2008 


COPT PASS 
.- BEGIN 
LDA #&00 
STA SCR 
LDA #&06 


\SET SCR (2 BYTES) TO 
\SCREEN START ADDRESS 
\AS SEEN BY THE CRTC 


187 


STA SCR+1 

. SCROLL 
410 LDA #&13 
420 JSR OSBYTE 
430 Jj 

440 PROCvdu (7) 
450 COPT PASS 
460 LDA SCR 
470 BNE OVER 
480 JPROCvdu(7) 
490 COPT PASS 
500 .OVER 

5i0 INC SCR 
S20 BNE SKIP 
530 INC SCR+I1 
540 .SKIP 

550 LDA SCR 
560 BNE SCROLL 
570 LDA SCRti 
580 CMP #&10 
590 BNE SCROLL 
600 BEQ BEGIN 
610 RTS:] 

620 NEXT PASS 
630 CALL START 
640 
650 
660 
670 


390 
400 
\WAIT FOR FIELD SYNC 


\INCREMENT SCR 
\(2 BYTES) 


\CHECK IF SCREEN END 
\NADDRESS AS SEEN BY 
\N\CRTC IS EXCEEDED 
\(MEM DIV 8) 


\IF SO RESTART CYCLE 


REM DATA TO SET UP 6845 CRTC REG’S 
DATA 238, 13,SCR, 0, 0@,0@,0 
DATA 238,12, SCR+1,0,0@,0@,0 


Program 8.5. contd 


When executed, the first thing to notice is that, when an object scrolls off 
left-hand side of the screen, it will reappear acharacter line higher at the 
right-hand side. The exception is at the top left of the screen where it will 
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reappear at the bottom right. To achieve a true sideways scroll, all the 
‘current’ bytes on the right-hand side of the screen will need to be moved 


down acharacter line in memory (wrap around). We say ‘current’, because 
the screen addresses corresponding to any fixed position on the screen will 
change as each hardware scroll is executed. Another option is continuously 
to generate a landscape strip a byte wide at the current memory locations 
corresponding to the screen’ extreme right-hand side, thus overprinting the 
above results. All this, however, will be regarded as an exercise. 
Experimenting with the programs can be a help here by placing markers at 
various known addresses and watching their progressions on the screen 
when scrolled. For example, the following could be inserted into Program 
8.5: 


LDA #3 \red marker (one byte wide) 
STA &3000\at default screen addr. 


An annoying flash occasionally occurs when register 12 is updated every 
256 scrolls. This is due to the time lag in setting register 12 after register 13 
or vice versa. Even waiting for the field synchronisation pulse has little 
effect on this, The problem can be overcome by directly addressing the 6845 
CRTC. The corresponding routine is shown in Program 8.6. 


10 REM PROGRAMMING THE 6845 CRITIC 

20 REM SIDEWAYS SCROLLING BY DIRECTLY 
30 REM ADDRESSING SHEILA LOCATIONS 

40 GOTO250 


60 DEFPROCvdu (N) 

70 LOCAL D,D%,B,byte,item, lbyte 

80 FOR item=1 TO N 

90 READ DS 

100 IF RIGHT#(D$,1)="@" THEN B=2 ELSE 
B=1 

110 D=EVAL (DS) 

120 IF ASC(DS) >64 THEN COPT PASS:LDA D 
:JSR OSWRCH: 31: GOTO160 

130 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 

140 byte=D MOD 256:PROCform 

150 IF B=2 THEN byte=D DIV 256:PROCfor 


160 NEXT item 

170 ENDPROC 

180 . 

190 DEFPROCform 

200 IF byte<>lbyte THEN COPT PASS:LDA 
#byte: J 

210 COPT PASS: JSR OSWRCH: J 


Program 8.6. Sideways scrolling by directly addressing SHEILA locations. 
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220 
250 
240 
290 
260 
2/70 
280 
290 
300 
510 
320 
330 
340 


lbyte=byte 
ENDPROC 


OSWRCH=&FFEE :s OSBYTE=&FFF4 
SCR=&70 

DIM START 1000 

FOR PASS=0 TO 3 STEP 3 
PZ=START 

RESTORE 


REM SET UP A TEST GRAPHICS SCREEN 
PROCvdu (29) 
DATA 22,2,18,0, 132, 16,18,0,3, 25,4, 


5008, 500@, 25, 1, 2008, 08, 25, 81, 0@, —2008, 25 
, 1, -2006, 0@, 25, 81, 08, 2008 


350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
300 
310 
920 
3350 
740 
350 
360 
370 
380 
770 
600 
610 
620 
630 
640 
650 
660 
670 


COPT PASS 

- BEGIN 

LDA #&00 \SET SCR (2 BYTES) TO 
STA SCR \SCREEN START ADDRESS 
LDA #&06 \AS SEEN BY THE CRITIC 
STA SCR+Ii 

- SCROLL 

LDA #&13 \WAIT FOR FIELD SYNC 
JSR OSBYTE 

LDA #&0D \SEND (R13) ADDRESS 
STA &FEOO \TO ADDRESS REGISTER 
LDA SCR 

STA &FEOI \SEND SCR TO (R13) 
BNE OVER 

LDA #&0C 

STA &FEOO \SEND (Ri2) ADDRESS 
LDA SCR+I1 

STA &FEOI \SEND SCR+i TO (R12) 
. OVER 

INC SCR 

BNE SKIP \INCREMENT SCR 

INC SCR+1 \(2 BYTES) 

- SKIP 

LDA SCR \CHECK IF SCREEN END 
BNE SCROLL \ADDRESS AS SEEN BY 
LDA SCR+1 \CRTC IS EXCEEDED 
CMP #&10 \ (MEM DIV 8) 

BNE SCROLL 

BEG BEGIN \IF SO RESTART CYCLE 
RTS: ] 

NEXT PASS 

CALL START 


Program 8.6. contd 
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The 6845 CRTC registers can be accessed directly by storing the 
required register address (or number) in the 6845 address register at 
SHEILA location &00 (that is, & FEOO). The selected register can then be 
read from or written to at SHEILA address &0I. This is fairly 
straightforward and details are documented on the listing. 


Vertical scrolling 


Vertical scrolling involves no new principles other than adding 80 each 
time, the number of 6845 CRTC characters in a line, to the current screen 
start address. The listing is shown in Program 8.7 with the necessary 
details. Notice that, in this case, the high-byte of the screen start address 
register 1s set each time within the loop since the extra code needed to by- 
pass it would be counter-productive. 


LIST 
10 REM PROGRAMMING THE 6845 CRITIC 


20 REM VERTICAL SCROLLING BY DIRECTLY 

30 REM ADDRESSING SHEILA LOCATIONS 

40 GOTO2ZS0 

30 

60 DEFPROCvdu (N) 

7O LOCAL D,D%,B,byte,item, lbyte 

BO FOR item=1 TO N 

90 READ D¢ 

100 IF RIGHT#(D$,1)="@" THEN B=2 ELSE 
B=1 

110 D=EVAL (DS) 

120 IF ASC(D$) >64 THEN COPT PASS:LDA D 
:JSR OSWRCH: 3: GOTO160 

i130 IF D<O THEN D=(ABS(D) EOR &FFFF) +1 

140 byte=D MOD 256:PROCform 

i150 IF B=2 THEN byte=D DIV 256:PROCfor 


160 NEXT item 

170 ENDPROC 

180 

190 DEFPROCform 

200 IF byte<?>lbyte THEN COPT PASS:LDA 
#byte: J 

210 COPT PASS: JSR OSWRCH: J 

220 lbyte=byte 

250 ENDPROC 

240 : 

230 OSWRCH=&FFEE: OSBYTE=&FFF4 

260 SCR=&70 

270 DIM START 1000 

280 FOR PASS=0 TO 3 STEP 3 


Program 8.7. Vertical scrolling of a MODE 2 graphics screen. 
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2970 
300 
310 
320 
330 
240 


350 
360 
370 
3580 
390 
400 
410 
420 
430 
440 
450 
4640 
470 
480 
490 
300 
310 
320 
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300 
360 
370 
380 
390 
600 
610 
620 
630 
640 
650 
660 
670 
680 


Summary 


l. 


PZ=START 
RESTORE 


REM SET UP A TEST GRAPHICS SCREEN 


PROCvdu (29) 


DATA 22,2,18,0,132,16,18,0,3, 25,4, 
500@, 500@, 25, 1, 200@, 08, 25, 81, 08, —200@8, 25 
,1,—-2008, 08, 25, 81, 0@, 2008 | 


COPT FASS 
. BEGIN 


LDA 
STA 
LDA 
STA 


#200 
SCR 
#%06 
SCR+1 


- SCROLL 
LDA: #&13 


JSR 
LDA 
STA 
LDA 
STA 
LDA 
STA 
LDA 
STA 
CLC 
LDA 
ADC 
STA 
BCC 
INC 


OSBYTE 
#&0D 
&FEOO 
SCR 
&FEOI 
#%0C 
&FEOO 
SCR+1 
S&FEOI 


SCR 
#250 
SCR 
SKIP 
SCR+1 


- SKIP 


LDA 
BNE 
LDA 
CMP 
BNE 
BEQ 


RTS: 


SCR 
SCROLL 
SCR+1 
#%10 
SCROLL 
BEGIN 
Ee 


NEXT PASS 
CALL START 


Program 8.7. contd 


\SET SCR (2 BYTES) TO 
\SCREEN START ADDRESS 
\AS SEEN BY THE CRTC 
\WAIT FOR FIELD SYNC 


\SEND (R13) ADDRESS 
\TO ADDRESS REGISTER 


\SEND SCR TO (R13) 
\SEND (Ri2) ADDRESS 
\SEND SCR+1i1 TO (R12) 
\ADD 80 CHARACTER 


\POSITIONS AS SEEN BY 
\CRTC(2 BYTES) 


\CHECK IF SCREEN END 
\ADDRESS AS SEEN BY 
\CRTC IS EXCEEDED 
\(MEM DIV @) 


\IF SO RESTART CYCLE 


191 


Machine code graphics can be smoother and faster if direct screen 


addressing 1s used. 
Mode 2 is popular for coloured graphics because of the 16 variations. 


In Mode 2, each screen memory location handles two pixels. 
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10. 
11. 


2: 


13. 


. Movement of objects on the screen can be reduced to shifting bytes 


around in memory. This allows a dynamic resolution of two pixels per 
byte in Mode 2. 


. The bit pattern within the byte determines the pixel colour. 
. Each pixel within the byte occupies 4 bits (a nibble) so a single pixel 


blob of colour is achieved by setting the other nibble to background 
colour. 


. Mode 2 screen locations extend from &3000 (top left) to &7FFF 


(bottom right). 


. Direct screen addressing is often fast enough to beat the field 


synchronisation pulses so a ‘wait until pulse’ trap is often required. 


. Moving multicolour objects (MOBs), can be programmed by user- 


defined characters but overprinting with another colour is time- 
consuming and causes flickering. This problem does not arise with 
direct screen addressing. 

MOB colours can be achieved by look-up tables. 

Hardware scrolling can be achieved by direct action on the 6845 CRTC 
controller. Registers 12 and 13 in the controller contain the start 
address of the displayed screen. They are within the SHEILA address 
band. 

Sideways scrolling to the left can be achieved by incrementing, and to 
the right by decrementing, the start address register. 

Vertical scrolling is achieved by adding 80 to, or subtracting 80 from, 
the start address register. This is because there are 80 CRTC characters 
per line in Mode 2. 


Self test 


8.1 Adapt Program 8.5 to scroll the screen to the right. 

8.2 Adapt Program 8.7 to scroll the screen downwards. 

8.3 Adapt Program 8.4 to move three MOBs independently. 

8.4 Design a MOB of your own which traverses the screen horizontally. 


Chapter Nine 


Interrupt Techniques 
and the User Port 


The anatomy of the user port 


Superficially, the user port is a 20-pin socket situated at the rear of the 
machine. It is rather neglected, only a minority of owners putting it to use. 
There are several reasons for this lack of interest. In the first place, its use 
demands at least a smattering of electronic knowledge and perhaps some 
dexterity with a soldering iron. For some reason (snobbery, perhaps’), 
many reviewers and critics mention the soldering iron with a slight air of 
patronage. This is a pity because once the craze for games begins to wear 
thin (hopefully not too far distant in time.), practical work, combining 
computer interests with home-made electronic gadgetry can open up 
exciting new possibilities. Another reason for the neglect may be a failure 
to appreciate the sophistication built into the 6522 VIA. This powerful chip 
provides the versatility and power of the port. The 20-pin socket is only the 
visible interface. To put the port into perspective, it should be considered as 
a component of the VIA, the full complement being as follows: 


(a) Sixteen memory-mapped registers. 

(b) Two 16-bit timers (also under software control). 

(c) A serial port. 

(d) Two 8-bit parallel ports. referred to as the A and B sides, each with two 
handshake control lines. 


Apart from one of the 8-bit parallel ports (Side A, which is dedicated to 
forming the Centronics printer interface), all the above facilities are freely 
available to the user. 

Both the user port and the RS423 are undedicated links to the outside 
world but the user port is the more complex and, unlike the RS423, obeys no 
established protocol. Although there are twenty pins in the socket, only ten 
carry information. The rest are either unused or earth return and +5 V 
lines. The pin connections are shown in Fig. 9.1. In this chapter there will 
follow examples of routines rather than programs. They should help you to 
experiment with the user port so as to control external machinery, robotic 
projects etc. 
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~ + Pins 1 and 20 will connect 
5V 5V All zero a to outside wires of ribbon 


External view looking into socket 


Fig. 9.1. Pin connections on the user port. 


Never use gash wiring when you are experimenting. Resist the tempta- 
tion to make both connections to the pins withcrocclips or twisted wire. Croc 
clips are a boon in ‘heavy’ engineering hook-ups but getting them to stick onto 
the delicate pins of computer plugs (without shorting other pins) requires 
patience, dexterity and luck. Luck is not an acceptable commodity where 
the user port is concerned. Damage to the port can feed back through the entire 
VIA and consequently damage the health of the printer interface. The 
correct thing to do, before even contemplating experimentation, is to 
purchase the correct ribbon cable and socket which mates with the user 
port plug. The cable and plug can be bought a little more cheaply as 
separate items. The plug is of the type known as an ‘Insulation 
Displacement Connector’ (IDC for short). All you need is a vice to squeeze 
the plug directly onto the ribbon cable, the insulation is automatically 
pierced and good contact made during the vice pressure. No solder and no 
wire strippers are needed! However, if you are at all squeamish, or don’t 
have a vice (mechanical that is), you can buy the ribbon cable with the plug 
already attached. 

Before the connection is made to the plug on the user port you should do 
something to the other end, the free end of the cable. Don’t just leave it 
flopping about with bare wires sticking out. Connect them, even if only 
temporarily, to some form of positive terminal. This can be a multipin 
connector strip soldered directly to the cable wires. Alternatively, if you 
intend to embark on long experimental orgies, it is wise to invest in one of 
those boards containing rows of tiny sockets into which you can push 
components and wires directly. The best solution of all (although requiring 
more outlay) is to invest in one of the more ambitious contraptions on the 
market, providing on-board power supplies. For what it’s worth, we have 
used a Proto-board (trademark) for some years. It provides just about 
everything needed for safe experimentation. ICs, even the 40-pin species, 
resistors, capacitors,_LEDs and wire connecting links can all be pushed 
directly into any of the hundreds of tiny holes. The net result is a gadget 
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which allows quite complex circuitry to be assembled quickly without 
suffering second degree burns on the finger tips. Power supply bus bars run. 
round the edges, supplied from a 1.0 amp/5V power supply and a pair of 
+/-15 volt lines, ideal for supplying op-amp chips needed in hybrid 
systems. 

Although a +5V supply is provided to the port, it should be used with 
great care. Remember, the power comes from the long-suffering switched 
mode power supply which already is reasonably well-occupied, particularly 
if it has to supply a single drive disk. Another useful purchase which 
quickly becomes invaluable is one of the ‘logic probes’. The power supply 
for the probes is taken (via croc clips this time) from the 5 volt computer 
supply or, preferably, from the Proto-board. The probe can be placed on 
any point of the circuit and will indicate whether it is in the HIGH (logic 1) 
or LOW (logic 0). The nature of the indication will vary according to the 
make but will normally be the colour of a LED lamp. 

Notice from Fig. 9.1 that each signal line has its own independent earth 
line. All these earths are already electrically connected together inside the 
computer and will, in most cases, be similarly connected at the user device 
end of the ribbon cable. A signal line, and its own separate earth line 


running next to it behave together as a ‘transmission line’ with an || 
impedance of the order of 600 ohms. This technique helps to guard against « 


‘cross talk’ between wires and stray interference induced electromagneti- 
cally. The transmission line concept, however, borders on the tongue-in- 
cheek because no attempt is made to terminate the lines with a matched 
load — a necessary practice according to established theory. However, the 
ribbon cable connection is quite satisfactory, providing it is not too long. A 
couple of metres is probably the safe limit. 


The 6522 VIA registers 


Before discussing these, the overall disposition of the user port lines should 
be understood. Figure 9.2 is the effective diagram. 

The eight data lines are labelled PBO to PB7. Each one can be separately 
programmed as either an input or an output by setting appropriate bits ina 
direction register (see later). When programmed as outputs, they deliver 
the usual TTL logic levels but are also capable of delivering ImA at 1.5V. It 
is therefore possible to drive silicon transistor pairs in the Darlington 
configuration. (ie. two transistors connected in series to achieve gain 
multiplication). Since a Darlington can have exceptionally high current 
gains, it is possible to switch loads approaching, say, 0.5 amp directly. PB7 
has a unique feature. It can be programmed to receive the output of T1, one 
of the interval timers on the chip. PB6 is also unique when programmed as 


an input. sit is possible for the other timer (T2) to count the number of \ 


(pulses arriving at the input of PB6. 7. F 
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The handshake (control) lines are CBI and CB2. These can act as 
interrupt inputs or as handshake outputs. 


TIMERS 
COUNTERS LATCHES 


CONTROL-REGISTERS 
ih miod 3 


SHIFT REGISTERS Barto 
6522 VIA 


RB Ld dl LORE 





CB20 OCBI1 PB7C YOO O O OPBO 
Data lines 


Fig. 9.2. The electrical appearance of the user port. 


The VIA internal registers 


The VIA responsible for the user port occupies 16 consecutive addresses in 
the SHEILA band, starting at &FE60. The following table shows the 
addresses in the form of an offset to & FE60. Thus, if the offset number fora 
particular register is given as 3, then its absolute address is & FE63. It is 
better to think of the address of any individual register as consisting of the 
base address (& FE60) plus the offset number of the register. for example, 
the address base+ 5 is tidier than writing this as & FE6S. Table 9.1 shows the 
location of the registers in the VIA. 


Table 9.1 The VIA addresses. 


Register Designation Function 
offset no. 
Re 0 ORBorIRB Output or input register B 
gee" I ORA or IRA = Output or input register A 
2 DDRB Data direction register B 
Atel & 3 DDRA Data direction register A 
of 4 TIC-L T1 low-byte latch or T1 low-byte counter 
5 T1C-H Tl high-byte counter 


‘, 
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Table 9.1. contd 


Register Designation Function 
offset No. 
6 TIL-L T1 low-byte latch 
7 TIL-H T1 high-byte latch 
8 T2C-L T2 low-byte latch or T2 low-byte counter 
9 T2C-H T2 high-byte counter 
10 SR Shift register 
1] ACR Auxiliary Control Register 
12 PCR Peripheral Control register 
13 IFR Interrupt Flag Register 
14 IER Interrupt Enable Register 
15 ORA/IRA Identical to offset 1 but no handshake. 


Registers | and 3 will normally be left unused because, as stated 
previously, they are dedicated to the parallel printer interface. This still 
leaves fourteen registers left to grapple with. To understand them all is a 
formidable task. It would be demoralising to plod methodically through 
them in register order. Far better to attack the most commonly used ones 
first, the DDRB and the ORB. 


DDRB (Data Direction Register B) 

This is addressed at & FE62 (Base+2). The binary pattern initialised in DDR 
defines which of the lines PBO-PB7 will behave as outputs and which as 
inputs. The rule for direction in this register is as follows; 


A ‘I’ defines the corresponding pin as an output and a ‘0’ defines an 


input. 





Here are some examples: 


LDA #&FF \all data lines outputs 
STA DDRE 


LDA #&0F \PBO-PBS outputs 
STA DDRB \PB4-PB/7 inputs 


ORB or IRB (Output Register B or Input Register B) 

DDRB defines the directivity of the data lines only; it does not define their 
actual logic states. For example, just because we have a | in bit 7 of the 
DDRB, it does not mean PB? will be in the ‘I’ state. Only the contents of 
ORB or IRB control the logic states on the data lines. Thus, when a data line 
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is programmed as an output, the corresponding bit set into DDRB decides 
the logic on the data line. On the other hand, if a line is programmed as an 
input, the logic state received from a device on that line is entered into the 
Input Register (IRB). Ifa line is programmed as an input, then any attempt 
to write into that line via ORB is ignored. 


Example 1: 
LDA #&FF \Al1 data lines outputs 
STA DDRE 
LDA #&03 \Set PBO and. PBi to 1 
STA ORB \rest at O 


Example 2: 
LDA #0 \A11 data lines inputs 
STA DDRB 
LDA IRB \Read input reg. into A 
Example 3: 


LDA #&FO \PB4-PB7 outputs, 
STA DDRB \rest inputs 

LDA #&FF \Attempt to write il 
STA ORB \in all lines 


Example 3 illustrates the point made earlier that any attempt to write into 
lines programmed as inputs will fail. Although we have written &FF in the 
instruction, only the higher order nibble will directly affect ORB. The bits in 
the output register corresponding to PBO-PB3 will depend onthe peripheral 
input logic. It is assumed in these examples that DDRBand ORB have been 
assigned to &FE62 and &FE60 respectively. 

There are two variations of input behaviour depending on whether 
‘latching’ is enabled or disabled. If latching is disabled, the level present at an 
input (the relevant PB pin) is read into IRB. If the latch is enabled, the level 
read into IRB is that which existed after the ‘last active transition’ arriving 
on CBI (when a pulse of the correct phasing and shape hit CBI input). 
In other words, if the conditions existing now are required, then latch 
must be in the disabled condition. We only enable latching if we require 
CBI to act as a data-valid signal and we wish to ignore levels arriving 
after the latching. The latch enabling is carried out on bit | of the Auxiliary 
Control Register (ACR). If this bit is 1, the input latch on IRB is enabled. If 
0, it is disabled. The ACR (addressed at baset+ &B) controls many other 
functions, so it is essential that programming bit | iscarried out insucha way 
that the remaining bits are left undisturbed. This is where the logic 
instructions ORA and AND become useful (refer back to Chapter 3). 

To enable the input latch, study the following: 


LDA ACR \Obtain the ACR 
ORA #%02 \OR it with 0000 0010 
‘STA ACR \Bit-1 in ACR is now 1 
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To disable the latch: 


LDA ACR \Obtain ACR 
AND #&FD \AND with 1111 1101 
STA ACR \Bit-1 in ACR now O 


The two control lines 


The control lines CBI and CB2 can be used for a wide range of control 
functions apart from initiating interrupt action. 

We are concerned at this point only with the VIA registers which are 
involved either entirely or partly with these control lines. They will also be 
used for defining the kind of signal we wish to intitiate the interrupt. These 
and other registers in the VIA must be understood first. Any gadget we 
construct (or buy) which is going to initiate interrupts will emit an electrical 
pulse of fixed characteristics. The pulse (or electrical level) will, of course, 
have to conform to TTL protocol but, even so, there are many possible 
variants. For example, the initiating signal may be simply the drop in 
voltage from HIGH to LOW (a so-called negative-going edge) or it may be 
from LOW to HIGH (a positive-going edge). It may even be a narrow 
downward-going or an upward-going spike. The VIA is brilliantly designed 
to cater for many possible input conditions. Registers can be initialised to 
accept, as an active interrupt, one of the above signal patterns but ignore the 
others. You may also remember that if an interrupt request is made, it will be 
refused if the I bit in the Processor Status Register is 1. 


PCR (Peripheral Control Register) 


The two control, or ‘handshake’, lines CB1l and CB2 have other functions 


When CB2 is an input (bit 7 = 0) 


7} 6] 5] 4] 3} 2] fo 


Determines CB2 direction. 
0=input, 1 = output. 


Active edge of CB2 when it 
is an input. 
0 = active low, 1 = active high. 


When CB2 is an input, decides 
if it is normal or independent. 
0=normal, 1 = independent. 


Active edge of CB1. 
0 = active low, 1 = active high. 
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besides interrupt. CB1 is always an input but CB2 can be an input or an 
output. Although the PCR is an 8-bit register, we shall only treat the left- 
hand four (bits 4 to 7) because the other half is identical and dedicated to the 
parallel printer interface. It is also a fraction easier to understand if the bit 
pattern is first restricted to the case when CB2 is an input. 

When mention is made of the ‘active edge’, it refers to the setting of the 
appropriate flag in another register (IFR). That is to say, the only indication 
that an acceptable pulse has appeared on CBI or CB2 input is the setting of 
the appropriate flag. 

The terms ‘normal’ and ‘independent’, apply only to CB2 and, even then, 
only when it is an input. These terms concern the conditions under which the 
CB2 flag is reset after it has been set. in the normal mode, the flag remains set 
until a READ or WRITE instruction on the data registers, ORB or IRB is 
activated (for example, an LDA or STA). Inthe independent mode, oncea 
flag is set, READ or WRITE does not reset it. The normal mode is designed 
for handshaking operations between computer and peripheral. The 
independent mode is useful for actions not directly involving computer 
inteffaction. 


Behaviour of CB2 when it is an output 

CB2 becomes an output if bit 7 of the PCR is set to 1. Clearly, the 
significance of bits 6 and 5 is then completely different as can be seen from 
the following: 


(a) Let bit 5=0 and bit 6—0. This configures the so-called handshake mode. 
CB2 goes LOW by a write operation on ORB and goes HIGH again on an 
active transition of the CBI input signal. 

(b) Let bit S=1 and bit 6=—0. This is the pulse output mode. A negative-going 
pulse (goes from HIGH down to LOW then back again) is emitted following 
a write operation on ORB. Ideal for gadgetry which is activated by a 
negative-going pulse. Just do a dummy write to ORB with an STA ORB. If 
the pulse is in the wrong direction for the gadget, it is a simple matter to 
interpose an inverter. 

(c) Let bit 6—1. This is the so-called manual mode because both levels output 
on CB2 must be directly programmed. The level on CB2 depends on bit 5. If 
bit 5is 0, CB2 remains LOW. If bit Sis 1, then CB2is HIGH. In other words, 
providing bit 6 remains at 1,CB2 mirrors the state of bit 5. 


Before continuing with the next register, some consolidation examples 
may help to decipher all this terrible complexity. If you find the above just a 
bit too much, don’t despair, yet! Just study the following and console 
yourself with the fact that the control of sophisticated and versatile 
equipment (and the VIA is certainly within this catagory) can never be easy. 


(1) To configure a simple 8-bit input port, with CBI and CB2 not used: 
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LDA #0 
STA DDRB \make all 8 lines inputs 


This would be useful for reading a set of push-buttons. Once configured as 
above, IRB could be read into the accumulator for processing. 
(2) A simple 8-bit output port, with CB1 and CB2 not used: 


LDA #&FF 
STA DDRB \make all 8 lines outputs 


This is used for switching LEDs, etc. 
(3) An output port with CBI to be active high input and CB2 to be active 
high input in independent mode: 


LDA #&FF \Make all lines outputs 
STA DDRB 

LDA #&70 \set O111 in left half 
STA PCR 


Note that the right-hand four bits in the PCR (the A side of the VIA 
dedicated to the printer) have been tentatively reset to 0. This is not alwaysa 
safe procedure because the printer is controlled by the operating systemand 
should not be altered. The safe way, in the general case, would be to use 
masking techniques. For example: 


LDA #&FF 

STA DDRB 

LDA PCR \Bring out PCR 

AND #20F \Clear left,keep right 
ORA #&70 N\O111 in leftsleave right 
STA PCR \replace modified PCR 


(4) PBO-PB4 to be inputs, PBS-PB7 to be outputs. CBI to be active low | 
input; CB2 to be a negative-going pulse ona WRITE to the ORB: 


LDA #&%E£O \1110 0000 in DDRB 
STA DDRB 

LDA #&%AO \1010 in left half 
STA PCR \of PCR 


The previous remarks regarding the printer interface still apply but the extra 
coding, if considered important, would be the same as before. 

(5) All data lines to be inputs. CBI active high input. CB2 an output in the 
manual mode commencing with CB2 LOW. 


LDA #0 

STA DDRBE 

LDA #&DO \1101 0000 IN PCR 
STA PCR 
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The general concepts of interrupt techniques have been discussed in 
Chapter |. You will remember that it is possible for a signal, which could 
arrive from an external device, to set in motion a chain of events. Such 
events (if allowed to progress) interrupt the present program then bring into 
play an interrupt service routine. On completion of this routine, the original 
program is rejoined at the point from which it was interrupted. If all goes 
well, the interruption and the rejoining goes smoothly. However, there are 
many pitfalls to overcome and much to be learned before ‘things do go well’. 

As might be expected, the computer itself contributes littletowards the task. 
Apart from recognising the existence of an interrupt, most of. the pre- 
preparation is the programmer’s responsibility. The concept of an interrupt 
is based on unpredictability. The computer never knows when it is going to 
be interrupted so precautions must be taken to ensure that the interrupt 
service routine contains provisions for preserving the contents of all registers 
(on the stack is a good a place as any) before corrupting them with the new 
routine data. Before the return from interrupt (RTI) the original register 
data must be returned. Apart from the loss in time, the original program 
should not even be aware of the interruption. With regard to the question of 
time, there is a well-established rule in the BBC machine. It is unwise for 
interrupt routines to last longer than 2 milliseconds. 

The routine is deemed to start at the instant the request is granted and 
lasts until the recognition of RTI. The operating system periodically 
requests interrupt to service the screen, keyboard and other routine 
background tasks, hence the above warning. However, 2 milliseconds is a 
long time. The clock is 2 MHz so it is still possible to squeeze in about 500 
machine code instructions, even if they each take an average of 4 clock 
cycles. 


The IFR (Interrupt Flag Register) 


When a signal arrives on CB1 or CB2 (if it is an input) it sets an appropriate 
flag to 1 in the IFR. The significance of each bit in the IFR 1s as follows: 


Interrupt Techniques and the User Port 203 


7}6|5]4]3}2]}1 Jo. 


General interrupt status bit. 
1 if any interrupt active and enabled. 
0 when interrupt condition cleared. 


Timer 1 flag. 

1 when time out. 

0 after reading T1 low-byte counter or 
writing Tl high-byte latch. 


Timer 2 flag. 
Behaves similar to T1 above. 


CBI flag. 
Cleared by a read or write of ORB. 


CB2 flag. 
Cleared by a read or write of ORB. 


Shift register flag. 
1 at end of 8 shifts. 
Cleared by read or write of SR. 


CAI flag. 
(behaves as CBI pro rata). 


CA2 flag. 
(behaves as CB2 pro rata). 





It is possible, and sometimes desirable, to clear directly one or more of 
the flags in the IFR. This is done (rather strangely) by writing ‘1’s into the 
flag positions to be cleared. Direct clearing in this manner will normally be 
required when the CBI or CB2 inputs are being used for purposes other than 
6502 involvement - in particular, when using the manual mode (refer back 
to PCR). Here are some examples in direct clearing: 


(1) Clear CB2 flag 


LDA #%08 \oO000 1000 
STA IFR 
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(2) Clear CBI and T2 flags 


LDA #&30 \O011 0000 
STA IFR 


(3) Clear all flags 


LDA #&FF \1111 1111 

STA IFR 
Bit 7 is the general interrupt status and is the only flag which cannot be reset 
(or set) directly. Therefore, the | in bit 7 position above is really of no 
consequence. 


The IER (Interrup Enable Register) 


The bits in this register correspond exactly as described for the previous 
register (IFR). It represents a last ditch stand between the various interrupt 
request sources and the 6502 IRQ input pin. For example, there may have 
been an active signal arriving on, say, CBI. This will have set the CBI flag 
in the IFR. However, there may already be another flag or flags set. The 
6502 can only accept one interrupt at a time so there is clearly a need for 
higher status register which can be programmed to select which flag is to be 
recognised (enabled). This is the role of the IER. It operates as follows: 


Bits 0-6: 

| = enable 

0 = enable 

Bit 7: 

Like bit 7 in the IFR, this bit is special. 

When bit 7 = 0: Each | in a bit position is cleared (disabled). 
When bit 7 = |: Each | in a bit position enables that bit. 
(Zeros in bit positions are left unchanged) 


We found this quite terrible to understand. Here are some examples which 
we hope will help you: 


(1) Enable CBI interrupt and disable all others: 


LDA #&%6F \0110 1111 
STA IER \Bit-7=0 so ’*1’*s disabled 
LDA #&90 \1001 0000 
STA IER \Bit-—-7=1 so bit-4 enabled 


Note that the second pattern is the logical complement of the first. This is 
not a coincidence. 
(2) Enable Timer 1, disable the rest and then clear the T] flag bit in the IFR. 
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LDA #&3F \Disable all others 
STA IER 

LDA #&CO \1100 0000 to enable Ti 
STA IER 

STA IFR \Reset Tl flag in IFR 


Note the logical complement again. 


Organising a system in which only one device is expected to request 
interrupts is not too dificult. We can see from the above that the trouble 
starts when there are many devices, all requesting interrupt at the same 
time. Whatever machinations we employ down at the VIA end, they will 
always be subservient to the I bit in the 6502. If this is at 1, nothing can 
barge into the existing program except, of course, viathe NMI input which 
should be treated as sacrosanct anyway. It is worth repeating the 6502 
instructions directly concerned with interrupt control: 


SEI will set the I bit, preventing interrupts. 

CLI will clear the I bit. 

BRK will set the B bit, save the Program Counter and the Status Register 
on the stack, set the I bit and load the contents of addresses & FFFE and 
&FFFF into the Program Counter. 

RTI restores the Status Register and the Program Counter from the stack. 
Unlike RTS, it does not add | to the return address. 


The timers and counters 


It is always possible to generate delays (time intervals) by loading one or 
more of the 6502 instruction registers with the desired delay number and 
counting down to zero. This is not always satisfactory because it ties up the 
computer. To provide for independent delays and various other pulse- 
counting operations, the VIA is equipped with a variety of timers, counters 
and latches. They are useful for generating interrupts at regular intervals, 
triggering external devices or simulating a real-time environment. The two 
timers, Tl and T2, are essentially 16-bit counters. Each counter occupies 
two consecutive addresses (low and high byte) but T1, the more complex of 
the two, has an associated 16-bit latch, consequently occupying a further 
two addresses. Before treating the timers in detail, it is useful to begin with 
an overview of the possibilities on offer. 


(1) They may be read or written into as six memory locations, four for T] 
and two for T2. (See the VIA addresses given earlier in this chapter.) 
(2) Their respective operation modes are governed by bits 5,6 and 7 of the 
Auxiliary Control Register (this is treated later). 

(3) Their status, at any time during the counting phase, is obtainable by 
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examining bits 5 and 6 in the IFR. By ‘status’ we mean whether or not the 
programmed interval has ended (time out). 

(4) To generate a single time interval, a timer is loaded with the number of 
clock pulses required (to generate that interval). 

(5) Pulses arriving on PB6 can be counted until they compare with a 
previously loaded number (T2 only). The normal use of PB6 as one of the 
eight data lines is, of course, suspended. 

(6) T1 can be used to provide continuous time intervals. The time interval 
between pulses will depend on a previously loaded number. 

(7) A single, or continuous series of pulses can be produced on PB7; the 
pulse width will be dependent ona previously loaded number. In this mode, 
PB7 will not be available as a normal data line. 


The ACR (Auxiliary Control Register) Cu wv . ; 


716}s}4]3}2}i 
res. 


T1 control 
0 = one-shot mode. 
1 = free-running mode. 


T1 control 
0 = Disable PB7 output. 
1 = Enable PB7 output. 


T2 control 
0 = decrement by 6502 system clock. 
1 = decrement by input pulses on PB6. 


Used for controlling Shift-register 
(treated separately) 


Port B input latch 
0 = disable latch. 1 = enable latch. 


Port A input latch 
0 = disable latch. 1 = enable latch. 


Timer T2 details 
It is clear from the above overview that T2 is more simple than T1. It can 
only generate simple time intervals or count pulses arriving on PB6. 

The low-byte address of T2 is used to write or read the low-order byte of 
the delay number. The T2 interrupt flag is automatically cleared on a read 


Interrupt Techniques and the User Port 207 


action. The high-byte address is used to write or read the high-byte of the 
number. Writing to this address completes the timer loading, clears the T2 
interrupt flag and starts the timing operation. On completion of the timing 
interval, the T2 interrupt flag is set (bit 5 of IFR). Here are some examples: 


(1). To program a delay time, equivalent to 2048 clock pulses: 


LDA #0 \Ensure bit-S=0 for system 
STA ACR \clock mode 

STA T2Lowbyte \Clear low-byte 

LDA #208 ‘ 

STA T2Highbyte \This starts count 
LDA #&%20 \O010 0000 mask for bit—-5 
-BACK BIT IFR \Performs AND mask 

REQ BACK \Bit-S not yet set 

LDA T2Lowbyte \Dummy read clears flog , 


(2) To cause delay until 100 pulses have been counted from an external 
source on PB6: 


LDA #0 

STA DDRBE \Make data lines inputs 
LDA #&20 \O0010 0000 

STA ACR \Make bit-—S=0,pulse count 
LDA #264 \Prepare for 100 pulses 
STA T2Lowbyte 

LDA #0 

T2 Highbyte \ This starts count 
LDA #&20 \O0010 0000 mask for bit-S 
-BACK BIT IFR \Performs AND mask 
BE@ BACK \ Bit-S not yet set 

LDA T2Lowbyte \Dummy read clears f 


Since these two examples have used the BIT test you may find it necessary 
to refer back to chapter 3. 


Timer T7 details 
This timer has a 16-bit latch as well as the norma! 16-bit counter. It is also 
possible to generate an output on PB7. There are four different operating 
modes, depending on bits 6 and 7 inthe ACR. The choice is single-shot or 
free-running mode (bit 6) and enable or disable PB7 output (bit 7). 
Bit 7=0 will disable PB7 output. Bit 7=1 will enable PB7 output. 
Bit 6=0 is one-shot mode. Bit 6—1 is free-running mode. 

The addressing details and the start and finish of timing are virtually the 
same as described under T2 apart from the different addresses and bit 6, 
instead of bit 5, for the interrupt flag in the IFR. The free-running mode is 





208 Advanced Machine Code Techniques for The BBC Micro 


made possible by the provision of a separate 16-bit latch in the usual low- 
byte (TIL), high-byte (T1H) form. These occupy two separate addresses. It 
is possible, therefore, to read or write into the latches without affecting the 
associated timer count. In the free-running mode, the number in the latches 
is automatically re-entered into the timer againand the count restarted. This 
makes it possible to generate a wave form of any mark to space ratio on 
PB7. This is because the logic level (HIGH or LOW) on PB7 remains fixed 
within a timing interval but inverts to the opposite state during the next 
interval and so on. Some examples of T1 operations are now given. 


(1) To produce a wave form of unity mark to space ratio from PB7 which 
carries on indefinitely. The pulse width is to be equivalent to 1024 clock 
cycles. 


LDA #&FF \Make data lines outputs 
STA DDRE 
LDA #%CO \1100 O0000(bité,7 set to i 


STA ACR \Free-running moDe in Til 
LDA #0 

STA Til \Clear T1 low byte latch 
LDA #&%04 \&04=1024 dec in high byte 


STA TIH \Starts waveform action 


Note carefully that no loop is necessary in the the above coding to produce 
repetitive action because of the automatic re-entering of the latch into the 
timers. The above coding merely triggers off the action and the computer is 
then free to carry on with other work. This can be a boon when designing 
complex control devices fed from the user port. It is useful to remember 
that the flag in the IFR is still a valid signal if the completion of each timing 
interval is significant to the rest of the program. It could, of course, be made 
to initiate an interrupt service routine. 


(2) To produce an output on PB7 after 64535 clock pulses. 


LDA #0 \Set Ti in one-shot mode 
STA ACR 

LDA #&FF 

STA TiL \&FF in Ti low byte latch 
STA TIH \This starts timing 

LDA #&40 \0100 0000 mask for bit-6 
. BACK BIT IFR 

BE@ BACK \Flag not yet set 

LDA TiL \Dummy read to clear on 
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The SR (Shift Register) 


This will not be one of the commonly used registers. It is more suitable for 
serial data transmission, whereas the user port is oriented towards parallel. 
However. in the interests of continuity, the following information is given 
on the facilities available. As previously stated, bits 2,3 and 4in the ACR 
determine the behaviour of the SR so it is sufficient to limit the discussion 
to the eight permutations of the three bits. 








ACR Bits Effect on Shift Register 
000 Disable SR 
001 Shift in at Counter 2 rate 
010 Shift in at system clock rate 
O11 Shift in at external clock rate 
100 Free running output at Counter 2 rate 
101 Shift out at Counter 2 rate 
110 Shift out at system clock rate 
111 Shift out at external clock rate 


nnn EEE 


Summary of the user port and its functions 


Any chapter attempting to explain a device as complex as the user port is an 
ordeal for the writer and an even worse ordeal for anyone trying to make 
sense of it afterwards! A brief summary is therefore justified. 


1. 


2 


J 


The 16 registers within the 6522 VIA chip are addressed within the 
band & FE60 to &FEFF. 

The chip handles the parallel printer interface on the ‘A side’ and the 
user port on the ‘B side’. 

Any of the eight data lines, PBO to PB7, can be inputs or outputs 
depending on the bit pattern set into the DDRB. PB6 and PB7 have 
qualities unique to the rest. 


. Output states on the lines depend on writing to ORB. Input states 


(which can be either direct or latched, depending on bit | of the ACR) 
can be read from IRB. ORB and IRB share the same address. Trying 
to write to an input is a sterile exercise. 

Lines CBI (always an input) and CB2 are for control purposes. They 
can be used for handshaking data transfers to or from the eight data 
lines or for any other purpose thought desirable. Either of them can be 
used to set ‘interrupt flags’ in the IFR. The setting of a flag, however, 
need not actually initiate an interrupt. 
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6. The CB2 direction depends on bit 7 of the PCR. 

7. CBI and CB2 can be programmed (by the PCR) to accept either a 
falling or rising edge of a pulse. CB2 can also have its flag in IFR reset 
‘normally’ (by a read of IRB or a write to ORB) or ‘independently’. 

8. Two flags in the IFR act as event signals for CBI and CB2. They are 
reset automatically on a read of IRB ora write to ORB. They can also 
be directly cleared by writing ‘l’s into the flag bits (yes ‘1’!). Bit 7 of the 
IFR is immune to attempts directly to set or reset. It is a general 
signal to indicate if any flags at all are at |. 

9. The timers T1 and T2 are associated with low- and high-byte counters. 
Numbers placed in the counters determine the time intervals. The 
count starts and the previous flag (in the IFR) cleared when the high- 
byte is loaded. The flag is set after the count number loaded has 
decreased to zero. 

10. There are many possible modes of timer operation, determined by bits 
5,6 and 7 in the ACR. 

11. T2 is the simplest of the two. It can generate a delay of N clock pulses, 
terminated by a flag, or count input pulses arriving on PB6 input. 

12. TI can be used to provide various continuous waveforms on PB7 out- 
put, independent of continuing computer support. It can also provide 
simple delays. 

13. A programmable Shift Register is available, mainly for organising 
serial data activities. 


This completes the treatment of the user port. Although it is under some 
control from the central processor (6502), it is clearly capable of carrying 
out some operations almost under its own volition. This is why the 
treatment has, so far, made little reference to the operating system. The 
coding examples given have been in isolation but it is hoped they will at 
least point the way. It is worth emphasising again that the flags are often 
referred to (rather misleadingly) as ‘interrupt flags’. They can cause 
interrupt (if you let them) only by enabling the appropriate bit in the IER. 
Many interesting projects designed for attachment to the user port can be 
handled quite successfully without involving interrupt action. Nevertheless, 
this is no excuse for not attempting simple interrupt routines. The essential 
thing is to keep them simple, in fact very simple, during the learning period. 
It is better to get used to handling internally generated interrupts before 
progressing to their initiation from the user port. 


Internally generated interrupts 


The BBC operating system is unique in the way it exposes (almost 
indecently) its hidden workings to the ordinary user. Not only does it 
expose them, it almost demands they are made use of. One example of this 
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is the list of FX14 calls at the top of page 426 of the User Guide, which is 
given more detailed treatment on page 465. FX calls become OSBYTE calls 
when in machine code. Thus, if we wish to take advantage of these calls, we 
ensure that the accumulator contains 14 before calling OSBYTE at 
&FFF4. There are seven dynamic conditions, any of which can be chosen 
to initiate an interrupt, as set out in Table 9.2. — 


Table 9.2 OSBYTE Event Interrupt Table. 
Note: Place 14 in A and Event Number in X before calling &FFEE 





Event number Enable event causing interrupt 
(number in X) 





Output buffer empty 

Input buffer full 

Character entering input buffer 
A/D conversion complete 
Start of vertical sync pulse 
Interval timer crossing zero 
Escape key pressed 


Nn hh WN — © 





A table, similar to the above, applies to disabling interrupts, except that A 
must contain 13 instead of 14 before calling OSBYTE. 

The interrupts service routine, initiated by any of the above events, 
depends on the user and guidelines will be given later. However, the first 
problem is how to prepare the operating system to accept the routine and, 
what is more important, how to instruct the operating system to jump to it 
and return. The steps to ensure smooth linkage can be carried out in the 
following order: 


(1) Load A and X then JSR OSBYTE (&FFEE), as selected from the Event 
Table above. 

(2) Load the low- and high-byte starting address of your service routine in 
the indirection vector locations at &0220 and &0221 respectively. These 
addresses can be confirmed from page 465 of the User Guide. 


Program 9.1 is a skeleton program to illustrate the above steps. To keep 
everything simple, the service routine is a simple jump to OSWRCH which 
prints the letter ‘Z’ whenever the ESCAPE key is pressed. 

Lines 100 to 180 prepare everything for a smooth transition to the service 
routine. Lines 190 to 230 givethe example service routine. The event enabled 
is code 6 in the Event Table, so lines 110 and 111 load 6in X and 14inA, 
prior to calling OSBYTE. This sets in motion an interrupt to ‘SER VICE’ 
whenever the ESCAPE key is pressed. Lines 140 to 170 are responsible for 
placing the starting address of ‘SER VICE’ in the page two vectors &0220/ 1 
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10 REM SIMPLE EVENT INTERRUPT 
420 Dit 1 TEST 100 
20 OSBYTE=&FFF4: OSWRCH=2FFEE 
GO FRE ree tre mee ene ee em mies 
~0O FOR FASS=0 TO 3 STEF 3&3 
60 FAZ=ITEST 
FR FER ee ten ee meetin eae mira cen tam a ioe 
BO [ 
90 OFT FASS 
100 .FREPARE 
110 LDX #6 \Enable ESCAPE event 
29 LDA #14 
130 JSR OSBYTE 
140 LDA #Service MOD 256 
150 STA &220 \Low byte interrupt vector 
160 LDA #Service DIV 256 
170 STA &221 \High byte interrupt vector 
180 RTS 
190 ,Service 
200 FHA: TXA: FHA: TYA: FHA: FHF 
210 LDA #90 
220 JSR OSWRCH 
220 PLPsPLA: TAY: PLA: TAX:FPLA 
240 RTS 
2a 
"ML Melee pena ne aeat  naaala 
270 NEXT 


/ 


Program 9.1. Simple interrupt on event code 6. 


which the operating system reserves for user-supplied interrupt routines. 
Since the program uses dynamic storage space allocated by the DIM 
statement at the top, it is not possible to know in advance the absolute 
addresses of ‘SERVICE’. This is why lines 140 and 160 take advantage of 
the MOD and DIV instructions. These sort out the addresses in the form of 
low- and high-byte. 

‘Service’ starts, as all interrupt routines should start, with saving all the 
registers on the stack. This is done by line 200 and it is well to note carefully 
the order of stacking. The example routine is a simple call to OSWRCH to 
write the character *Z’. Line 230 restores, in reverse order, the registers to 
their previous values. 

RUN the program first and examine the assembly out, particularly the 
addresses for the start of ‘SERVICE’ which was generated by the DIM 
statement. | 

Then enter CALL PREPARE. The interrupt routine is now activated. 
This can be verified by pressing ESCAPE several times and noting the ‘Z’. 
This routine is permanent (until BREAK) and will appearto remainas part of 
the operating system. For example, type NEW to get rid of the BASIC 
program and type in a few lines of another program, say, a simple 
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FOR/ NEXT loop. Whenever ESCAPE key is pressed, the letter ‘Z’ will 
still appear. 

To counter possible ‘so what?’ criticism, we must admit that Program 9. | 
is not expected to be of the slightest use, apart from pointing the way to 
event interrupt handling. To have included a complex program as a service 
routine would have clouded the essential issues. Any coding you choose 
can be substituted in the space occupied by lines 210 and 220 (you can 
always RENUMBER 10,100) to get more line space.) Also, you can change 
the event code 6 in line 110 to experiment with other events. Try it with 
Event 4 in A so that ‘SERVICE’ interrupts whenever a vertical sync pulse 
appears (which is very often!). The screen fills up with Zs until you 
BREAK. 


Peripheral initiated interrupts 

Once the general form of Program 9.1 is understood, it is comparatively 
simple to transfer the idea from Event interrupts to peripheral interrupts. 
The peripheral is, of course, the user port. You will remember that an active tran- 
sition at the input of either CB! or CB2 (when it is an input) can set a flag in the 
IFR. If the corresponding bit in the IER is set to 1, the VIA sends the signal 
up the IRQ line and, providing the I bit in the process status register is 0, a 
full-blooded interrupt situation is established. The only difference in 
general principle in handling this kind of interrupt is the interrupt vector. It 
is no longer &0220/1. Page 466 of the User Guide names the vector as 
IRQ2V which is situated at &0206. 


Summary 


1. The user port is the ‘B’ half of the VIA. This is an interface to the 
outside world, providing a set of 8 data lines (PBO to PB7) and 2 
control lines (CBI and CB2). 

2. The port also supplies a five volt positive supply rail and an earth rail 
for each of the data and control lines. 

3. There are 16 programmable registers in the VIA, occupying addresses 
&FE60 to & FE6F in the SHEILA band. 

4. The bits within DDRB determine the direction of the 8 data lines. Data 
lines to be outputs must have ‘I’s, lines to be inputs must have ‘0’s, in 
the respective bit positions. 

5. ORB and IRB are the output and input registers for the 8 data lines. 

6. Data lines, programmed ds inputs by DDRB, are unaffected” by 
attempts to WRITE directly into them. 

7. Input data can be latched or unlatched, depending on whether bit | in 
the ACR is I| or O respectively. 
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10. 


11. 


EZ. 


13% 


14. 


15. 


. CBI is always an input but CB2 can be programmed as an input or as 


an output. 


. Bits 4 to 7 in the PCR are concerned with the behaviour of CBI and 


CB2. Bits 0 to 3 are concerned with the A side of the VIA dedicated to 
the Centronics printer. 

The IFR indicates to the programmer, by means of flags, that an 
active signal has been detected at CBI and/or CB2 input. It also has 
flags for detecting time out conditions on the timers T1, T2 and the 
completion of 8 shifts from the shift register SR. 

IER is closely associated with IFR ona bit by bit level. A set flag in IFR 
can only cause an interrupt request on the IRQ line if the corresponding 
bit in IER is at 1. 

The timers Tl and T2 are 16-bit counters, T1 having an associated 
latch. | 

Delays can be achieved, or interrupts generated, at regular intervals 
by loading starting numbers into the high- and low-bytes of the 
counters. 

PB7 can receive the output of the timers. PB6 can accept input pulses 
which can be compared with a number present in T2. 

It is possible to produce a waveform of arbitrary mark to space ratio 
at the output of PB7. 


Self test 


9.1 
9:2 
o3 


9.4 
» fe 


9.6 


What is the absolute hexadecimal address of the IFR? 

What is the absolute hexadecimal address of T2 high-byte counter? 
State the required hexadecimal contents of DDRB if all lines except 
PBS are to be outputs. 

Which of the two control lines can never be an output. 

Assume the CBI flag in the IFR is set. What additional status bits in 
the system must be set, or reset, before an actual interrupt occurs? 
How do you enable the input latch on port B. 


Appendix A 
Binary and Logic 





Binary 


@® Unless otherwise stated, numbers will be assumed to be decimal. Hex 
numbers are prefixed by &. 

@ To aid comprehension, strings of bits may be split into groups of four, 
but the space between groups is artificial. 

@ ‘X’ is used for ‘don’t care’ bits and can mean | or 0. 

@ To ‘flip’ a bit means to change it from | to 0 or vice versa. 


Unsigned binary system 
Computer languages, whether entered in high level, assembly coding, or 
hexadecimal, are incomprehensible to the machine. All information is 
converted by the resident operating system to binary bits (1s and Os). 
All number systems, including the familiar decimal, rely on the relative 
position of digits to indicate their ‘worth’. Each binary digit in a byte is 
twice the value of the bit to its right. In pure unsigned binary, the value of 
each binary | is shown below in both decimal equivalents and powers of 
two: 


a an a, or a ae, 27 796-95 94 93 92 91 30 
pada {adadafadada zi Piofafafafatafafe 


Examples: 


1000 1001 = 137 
1001 1111 = 159 
1111 1111 = 255 


Sometimes, the following tip is useful: 
A string of all ls = 2"-1, where n=number of bits in the string. 
Examples: 


1111 =2*-1= 15 
1111 1111 1111 1111 = 2'°-1 = 65535 
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It is advisable, but not essential, to memorise powers of two up to the first 
sixteen bit positions. It is convenient to consider them divided into low- 
byte and high-byte as follows: 


Powers of 2 


High-byte Low-byte 





Any binary number in the high-byte position is always 256 times its low- 
byte value. For example: 0000 1001 would be worth 9 if low-byte, but 256*9 
= 2304 if high-byte. Remember, the 6502 always stores 16-bit data in 
consecutive memory addresses, low-byte first. 


Hexadecimal notation (hex) 
Hex uses the 16 characters 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F to describe a 
nibble (4 bits): 


0000=0 0001=1 0010=2 0011=3 
0100=4 0101=5 0110=6 O111=7 
1000=8 1001=9 1010=A 1011=B 
1100=C 1101=D 1110=E I111=F 


Two hex characters describe a byte. Some examples follow: 


1111 0011 = F3 0001 1011=1B 1100 1101= CD 0000 0001 = Ol 
1111 1111 1111 1111 = FFFF 1000 1100 1010 0111 = 8CA7 


Hex arithmetic 
Hex is based on powers of 16 so any character, depending on its position, 
must be multiplied by the appropriate power of 16 as follows: 


16 =4096 16°=256 16°.=16 16 =1 

4096 256 16 | 
Using H for hex character: H H H H 
Examples: 


&0032=(3*16)+2=50 &00FC=(15*16)1+12 = 252 
&00FF=(15*16)+15=255 

& 203 E=(2* 4096)+-(3* 16)+ 14= 8254 
&1111=4096+256+ 164+ 1=4369 
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Signed binary and two’s complement | 
In order to represent both positive and negative numbers in a byte, the msb 
(bit 7) is reserved as the ‘sign’ bit. 

The sign bit is 1 for negative and 0 for positive numbers. For example: 


OXXX XXX is positive and 1XXX XXXX is negative. 


A negative number is said to be the two's complement of the equivalent 
positive and vice versa. There are two ways of obtaining the two’s 
complement of a binary number: 


(1) First flip all the bits and then add one. Ignore any carry out from msb 
end. 

(2) Starting from msb, copy up to and including the first ‘1’ then flip the 
remaining bits. 


Examples: Number Two’s complement 
(+7) 0000 OI11 1111 1001 (—7) 
(+1) 0000 0001 DL. ADE rsd) 
(—2} litt Hie 0000 0010 (+2) 


Method | can lead to errors when adding the 1, so method 2 is safer. 
The two’s complement of decimal numbers is found by subtracting from 
200. 


Example: —1= 1-256 = 255 = 1111 1111. 


The two’s complement of hex numbers is found by subtracting from & FF 
and adding I. 


Example: —3 = &FF-3 = &FCt+1 = &FD 


The largest positive number in a byte is +127 = O111 1111 = &7F. 
The largest negative number is —128 = 1000 0000 = &80. 


Notes: 

(a) The larger the negative number, the more binary 0s appear. In two’s 
complement, everything is reversed, including the relative status of Is and 
Os. 

(b) There are 128 positive and 128 negative numbers. The fact that zeroisa 
positive number is the reason why there appears to be one more negative 
than positive (—128,+127). 


Binary coded decimal (BCD) 

Decimal numbers are awkward when expressed in binary, simply because 
base 10 and base 2 don’t mix well. BCD is acode which sacrifices efficiency 
for decimal compatibility. A byte is divided into two 4-bit groups (nibbles). 
Each nibble is coded for numbers from 0 to 9, as follows: 
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BCD _ Decimal 


0000 0 
0001 l 
0010 2 
0011 3 
0100 4 
0101 5 
0110 6 
O11] 7 
1000 8 
1001 9 


The six groups from 1010 to 1111, which are used forthe characters A to F in 
hex, are illegal in BCD. A single byte can hold decimal numbers in BCD 
form only in the range, 0 to 99. 


Examples: 0001 0011=13 0000 0111=07 1001 1001=99 


The efficiency of a code = (number of combinations used)/total 
combinations). 

In pure binary, all combinations are used, so the efficiency is 100%. In 
BCD, only 10 combinations are used out of a total of 16 possible, so the 
efficiency is 10/16=63% approximately. When the efficiency within a byte 
is calculated, the loss in information content is worse, 100/256 which is not 
quite 40%. 

Because of the inefficiency of BCD, its use is limited. However, a large 
proportion of digital instrumentation delivers, or expects to receive, 
information in BCD form. The 6502 microprocessor obligingly processes 
BCD arithmetic if the D flag in the processor register is set to 1. However, it 
is up to the programmer to ensure that the data entering the arithmetic area 
is free from illegal groups. 


Logic 


Logic levels and TTL 

Logic chips contain circuits which respond to, or deliver, one of two 
possible voltage levels. A certain family of chips (known as TTL) has set a 
common standard (see Fig. A.1). All members of the TTL family (there are 
over 300 different chips), have type numbers beginning with 74 or 74LS. 
The LS prefix denotes Low-power Shockty and although similar in logic 
function they consume less current and are faster. LS is now recommended 
for general use in favour of the traditional ‘standard’ TTL. 
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Voltage 
“a 


+6 


Chip death 






+4 Logici HIGH 





Logic 0 LOW 


Ground 0 
Fig. A.7. TTL logic levels. 


Logic 1 (also known as a HIGH) = any voltage within the range +2.8 V to 
> ¥. 

Logic 0 (also known as a LOW) = any voltage within the range 0 V to 
+0.8 V. 


Any voltage in between is called a ‘bad level’ and will lead to indeterminate 
results. Bad levels are usually caused by an output over load pulling up or 
dragging down the voltage. Testing for HIGHs and LOWs at various 
points in the system can be done with a voltmeter although one of the 
various makes of ‘logic probe’ displaying either a red or green light is more 
convenient and less hazardous. 


Logic gates 

A gate is essentially a logic-operated switch with one output and one or 
more inputs. The combination of logic voltages on the inputs determines 
the output state. Although the function of a gate can be described in words, 
a truth table — with all possible input combinations - is concise and 
unequivocal. Figure A.2 shows the six common gates in their most popular 
diagrammatic form, together with the corresponding truth tables. The 
inverter is not worth a truth table. 


Notes on Fig. A.2: 

The AND gate: output | only if all inputs 1. 

The OR gate: output | only if one or more inputs are I. 

The NAND gate: output 0 only if all inputs are I. 

The NOR gate: output 0 only if one or more inputs are I. 

The INVERTER: output is reverse of input. 

The EXCLUSIVE OR: output | only if inputs are different. 


220 Advanced Machine Code Techniques for The BBC Micro 





A-—-——E--S 


INVERTER Alternative INVERTER symbol 


S 








7 
7400 7404 
Fig. A.2. The six primitive gates. 


Examination of the truth table reveals that it is similar to the OR but 
‘exludes’ the bottom AND line. 


Although only two inputs are shown at each gate in Fig. A.2, the TTL family 
include gates with as many as eight inputs. The two common chips are the 
7400 quad NAND and the 7404 hex inverter. The pin connections for these 
appear in Fig. A.2. The power supply to the chips is marked VCC (+5V) 
and Gnd (0 V) pin marked Vcc. Chip pin-out diagrams are always drawn 
looking down on to the top of the chip. 
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Active levels and negation 

More complex chips such as decoders, buffers, counters, etc., do not always 
recognise a | or HIGH state as being in some way superior to a 0 or LOW 
state. Any input terminal which is immune to a | but activated by 0 is said 
to be active-LOW. Such terminals are indicated either by (a) a bar over the 
terminal label suchas C or clock (the bar is a Boolean symbol for negation - 
for example, A is the opposite state to A) or (b) a small circle or ‘bubble’. 





Dominance of NAND and INVERTER gates 

TTL logic is based on the NAND and INVERTER, the other three gates 
tend to be under-used and therefore not so readily available. There are 
three reasons for their dominance: 


(1) Many of the more complex chips are gated on by a LOW rather than a 
HIGH in order to minimise standby current. 

(2) The internal circuitry of TTL gates is such that NAND and inverter 
functions arise more naturally and require less components. 

(3) Combinations of NAND and INVERTER can be arranged to simulate 
AND and OR gates. An AND is an inverted output NAND. An OR is a 
NAND with all inputs inverted. Even the INVERTER is not strictly 
essential because a NAND, with all inputs strapped or held permanently at 
1, behaves as an INVERTER. 


Use of gates 

Traditionally, the study of logic has leant heavily on a branch of 
mathematics known as Boolean Algebra. It is both a useful shorthand and 
a powerful tool in the mathematical analysis of logic. Boolean is still useful 
but, for the home enthusiast, the availability of complex integrated circuits 
has lessened the need to design and construct systems from an assortment 
of gates, so time spent on studying the special algebra may not always be 
justified. 

The main use (now) for logic gates is to ‘glue’ together the more complex 
chips which may be incompatible in some way. For instance, one chip may 
deliver a | where a 0 is needed, meriting an inverter in between. Another 
possibility is the need to enable a chip only if ‘something’ else is at 1. Figure 
A.3 shows some of the switching arrangements using gates. 

Figure A.3(a) shows an AND gate simulating a series switch in the data 
path. A serial data stream entering can only pass through the switch if the 
control C is HIGH. 

Ifa NAND gate is used, as in Fig. A.3(b), an inverter is needed. Without 
the inverter, the serial data stream would still pass if C is held HIGH but 
would be in inverted form (called the ‘one’s or logical complement’). 

Figure A.3(c) shows how it is possible to enable a chip providing both A 
and B inputs are held HIGH. Note that the example chip is marked CE (not 
chip enable) which is convenient for the LOW output from the NAND. 
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(a) 


Data in 





Data out 





Hh 


(b) 


Data in Data out 
eee Ss ee ee 


| 
| 
| 

C 


(d) 


Data in 






Data out 







(e) 





Data out 1 


Data in 


Data out 2 


C 
Fig. A.3. Uses of simple gates. 


Remember that the bar over the CE label is an alternative to the bubble. 

The exclusive-OR gate in Fig. A.3(d) provides an easy way of controlling 
the phase of the output data. If the control C is held LOW, the output data 
stream is a replica of the input. If C is held HIGH, the output data is an 
inverted version of the input. 
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Figure A.3(e) is a simulation of a single pole, double throw switch 
whereby the serial data stream can be diverted to either data out | or data 
out 2, depending on the state of the control C. If Cis held HIGH, the data 
emerges from the bottom gate but from the top gate if held LOW. Inverters 
would be needed at the outputs if NANDs were used instead of ANDs. 


Flip-flops 

Logic gates deliver an output state, depending on the present input 
conditions. They are combinatorial devices, acting 1n real time and capable 
of analysis by simple Boolean algebra. Flip-flops are in an entirely different 
class because their present state depends on some event (usually a logic 
pulse) which occurred in the past. From this, it should be easy to conclude 
that flip-flops have the ability to memorise. But they can’t memorise much. 
In fact, one flip-flop can only store a single bit so we would need eight of 
them to store one byte of data. A flip-flop which is storing a I is said to be 
set; if it is storing a 0, it is said to be reset. The four varieties of flip-flop in 
common use are shown in Fig. A.4. ) 


(a) 


R 
Simple SR flip-flop 


a a Hy i at 


MI 


(b) 





Toggle flip-flop 


(c) (d) 





D type flip-flop JK flip-flop 


Fig. A.4 Types of flip-flop. 
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The SR flip-flop. The logic symbol shows it to be a four-terminal black 
box. The output state is available at the ‘Q’ terminal which is set or reset by 
a negative-going pulse on S or R respectively. The term negative-going 
means a sudden drop in voltage from HIGH to LOW. It is important to 
realise that, although a transition from HIGH to LOW 1s required, it is not 
necessary to maintain the LOW state. In fact, the easiest way to try it out 
would be momentarily to touch the S terminal witha grounded wire. If it is 
already in the set state, nothing will happen. If it is in the reset state, the Q 
terminal will go from 0 to | and remain in the new state until you flick the R 
terminal. The action is similar to the push-on/push-off switch found on 
table lights — it memorises the last order. 

When a RS flip-flop is needed, it is customary to ‘make’ one from two 
cross-strapped NANDs (this only takes half a 7400). As a ‘free gift’, the Q 
terminal is always in the opposite state to Q. Knowledge of this can often 
Save an inverter. 


The T flip-flop. This is often called a ‘toggle’ because every negative-going 
edge of a pulse on T will change the state at Q — it toggles the state 
backwards and forwards. The waveforms shown on Fig. A.4(b) indicate 
that a continuous pulse of frequency f applied to T causes an output 
frequency of f/2, illustrating its primary use as a frequency divide-by-two 
stage. The Q output will be at the same half-frequency as Q but in the 
opposite phase. Direct set and reset terminals, which override T, may also 
be present in some types. 


The D flip-flop. The D stands for ‘Data’. The state at Q is oblivious to the D 
state until a trigger pulse arrives at T. When the negative going edge of the 
trigger arrives, the state of D (at that time) is passed (latched) into the flip- 
flop. In other words, the Q state is always the state which D was, prior to 
the arrival of the trigger. The 7475 is a quad D-type latch, containing four 
identical D flip-flops. Two of these can be used to latch in a byte of data. 


The JK flip-flop. This is a versatile breed of flip-flop, shown in Fig. A.4(d). 
The logic state on the J,K terminals decide the eventual state of Q after the 
next trigger pulse on T. The action is best described with the aid of the 
following truth table: 


J K State of Q after next trigger 


0 0 No change 
0 | Reset (Q=0) 
:- Set (Q=0) 

it | Change 
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Note that when J and K are both 0, the flip-flop is paralysed, unable to 
respond to any triggers. 

If J and K are both held at |, a trigger will always change the state. In other 
words, this perm of J,K transforms it to a T flip-flop. 

If J is joined to K by an inverter, it is transformed to a D flip-flop, the J 
terminal acting as a D. 

From this, it is easy to see why the JK flip-flop was described as versatile. 


Finally, it should be mentioned that some diagrams will choose different 
labels for the trigger terminal. The terminal we have marked T may, in 
some diagrams, be marked clock or just C. 


Wired-OR and tristate outputs 

A microprocessor system is based on the common bus. The output data 
from RAM, ROM, etc., are all wired in parallel across the same wires. It is 
important that such devices in the disabled state are effectively 
disconnected from the common bus. Normal TTL logic allows inputs to be 
connected together but under no circumstances must outputs be connected 
together unless they are of the class known as open-collector. Figure A.5(a) 
shows the idea behind wired-OR connections. 


(a) +5V 


| 2.2K 
Open collectors 


Common bus 


(b) 





E 
Fig. A.5. Wired-OR and tristate. 
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The output stage of normal TTL consists of two transistors in series 
across the 5 V supply (known as totem pole) with the gate output emerging 
from the centre point. The top transistor is missing in open-collector types 
and the feed to the 5 V line must come from an external pull-up resistor. 
This allows several outputs to be connected together, providing they all 
share the same pull-up resistor. Many of the popular chips in the TTL 
family are available in both standard and open-collector version. 

Although wired-OR connections are useful in odd places, the solution is 
too messy for computer bus work. The alternative, and cleaner, solution is 
to provide chips with tristate outputs as shown in Fig. A.5(b). An extra 
transistor is built in to each output line, acting as a series switch and turned 
on or off by the enable terminal. When the chip is disabled, the outputs are 
effectively removed from the bus. The TTL chips offering tristate outputs 
are normally more complex than simple gates. RAM and ROM chips are 
almost always tristate. 


Mechanical switches 

Some disconcerting effects can occur if logic voltages are applied by means 
of an ordinary mechanical switch, particularly if the terminal supplied 
expects a single pulse. Due to the natural resonance of the operating spring, 
switches bounce backwards and forwards several times before coming to 
rest in the final position. The evil is called switch-bounce and can be overcome 
by either of the following two methods: 


(1) Using an SR flip-flop and a single pole two-way switch as shown in Fig. 
A6. The flip-flop can be fashioned from the two strapped-NANDs 
previously described. 

(2) Using software, incorporating a few milliseconds delay before ‘reading’ 
the state of the switch. 


+ 5V Switch output 


Jax | ax 


Fig. A.6. Switch de-bounce circuit. 
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Driving lamps and relays | 

Small lamps are popular for displaying logic states. All lamps take current 
which can be ill afforded in logic work. Incandescent filament lamps are 
sluggish and take 50 mA or more. Neons take negligible current but require 
about 80 volts before they emit the characteristic red glow. This leaves the 
light-emitting diode (LED) as the only serious contender. They give a 
reasonable light with about 5 mA and only drop about 1.2 volts. They must 
always be fed via a series resistor somewhere in the chain in order to ensure 
current, rather than voltage, drive. They are best driven from the output of 
an inverter as shown in Fig. A.7. 


+5V 







A TIL 209 
4 (or equivalent) 


Ph 


Logic 1 here 
Fig. A.7. Feeding a LED. 


The LED lights when a | is applied to the inverter input. The inverter 
output then drops to near ground, completing the circuit through the LED. 
The output of the inverter is said to be sinking the LED current to ground. 

Devices which require current in excess of 20 mA or voltages in excess of 
5 V cannot be driven from logic circuits without help. This help can be 
supplied by the familiar electromagnetic relay, the opto-isolator or a 
combination of both. Figure A.8 shows some arrangements. 

In spite of the glamour associated with the semiconductor age, there are 
still uses for the traditional electromagnetic relay. Design methods have 
improved and the modern forms are efficient, physically small and take 
relatively low currents. Although no different in principle, the variant 
known as the reed relay, shown in Fig. A.8(c), is common-place in modern 
interface circuitry. The operating contacts are enclosed within a glass tube 
filled with inert gas, which prevents the build-up of oxidation products. 
Because of this, the contact life is much higher than in the traditional open- 
contact relay. The operating coil is a separate component slipped over the 
tube and therefore can cater for a variety of current and impedance 
requirements. 

Relays fulfil two primary requirements of the power interface: 
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(a) 


-_ 


(b) 


Load 
| 
Load power supply 
7 pe | 
Logic drive oe | 
 & | 
| 





+5V + 


| 

150 ohm l 
| 

| 


Opto isolator 
(RS 305-759) 


+5V 
—— (c) 
: Load contacts 
1N4148 _ fo 
operating coil 





glass tube 


Fig. A.8. Relays and opto-isolator drives. 


(a) They allow the weak logic output from the computer to control high 
power. 
(b) They electrically isolate the computer from high voltage circuits. 


They must never be used without a reverse diode across the operat- 
ing coil. The diode safeguards the logic circuits from induced voltages 
which appear when the current is interrupted. Figure A.8(b) shows a 
typical drive arrangement, using a common npn transistor as a current 
amplifier. The transistor conducts through the operating coil of the relay. 
The IK resistor supplying the requisite base current. The presence of the 
inverter gate means that the transistor conducts on a logic 0 input and 
switches off ona logic |. This is acase of an active-low drive causing a back- 
to-front action. If this is undesirable, the remedy is to insert an extra series 
inverter to bring it right again or substitute a non-inverting buffer gate. In 
either case, some form of logic gate is desirable in home constructed 
projects rather than a direct raw feed from the computer output port. Gates 
are cheap, computers aren't! 

The opto-isolator is another popular component in interface work. Like 
the electromagnetic relay, the objective is to isolate electrically the 
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computer from any high power/voltage/current components. In fact, the 
only connection is via the light emitted from a small LED falling on the 
base of a light-sensitive transistor. They are available singly as 6-pin chips, 
with the diode and transistor buried within the silicon. A typical circuit 
using a single opto-isolator is shown in Fig. A.8(a). The RS 305759 is only 
one of the many types available in the catalogues. 

The box marked ‘load’ is a blanket term covering any contraption driven 
by the isolator. In all probability, this will include yet another transistor 
because the opto-isolator introduces a power ‘gain’ of less than unity 
(typically 0.2). To convert the loss into a gain, some opto-isolators 
incorporate two transistors and are classified as Darlington-connected. 
Some chips are available which contain four independent opto-isolators so 
two of these could handle the output from an 8-bit port. 


Schmitt triggers 

When the logic state changes from 0 to I, or vice versa, logic chips expect 
the change to be rapid. In other words, the wave form should display, as far 
as possible, straight-sided pulses. If the input changes are sluggish, the 
behaviour could be impaired, particularly for clock-type inputs . If the 
input is obtained from the output of another logic circuit and the wiring 
between the two is not too long, there is no problem. However, if the input 
is obtained from an analogue, or ‘home-made’ source, the waveform is 
probably suspect and must be cleaned up before qualifying as a legitimate 
gate input. The 74 logic series has the answer in the form of the schmitt 
trigger, a standard circuit which accepts a poor pulse shape and transforms 
it into a steep-sided version. Figure A9 shows the gate symbol with typical 
input and output waveforms. The 7414 is a hex schmitt inverter, 
performing in the same way as a normal inverter but accepts poor 
waveforms. The schmitt does not protect against voltages which are out of 
range. It offers waveform but not voltage protection. 


IN 


Inverter symbol OUT 


Waveforms 


Fig. A.9. The schmitt trigger. 


With reference to the mention of ‘long’ wires, it is worth pointing out that 
distributed capacity across wires, or between wires and ground, is often a 
cause of weird faults. It is sometimes a source of complaint that 
manufacturers of peripheral equipment appear to be miserly in the length 
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of connecting cable supplied. In all fairness, this is not always due to cost 
penny-pinching. It is simply a wise precaution to avoid complaints of 
erratic behaviour which might arise if the cable length were increased. Apart 
from distributed capacity, the longer the wire, the more chance of picking 
up stray induced voltages. 


Timer chips 

It must be admitted that discussion of these chips is a little out of place here. 
The 555 timer chip is not strictly a logic circuit although, if operated froma 
+5V supply, it accepts and delivers reasonable TTL voltage states. It is 
versatile, very low-priced, and easy to use. We are concerned here only with 
its use as a hardware timer. That is to say, a device which, on receipt of a 
single narrow pulse, delivers an output HIGH state for a certain time 
before reverting to the quiescent LOW state automatically. Figure A10 (a) 
shows the pin connections, wiring and waveforms. 


(a) 
V+ 


Ground 


Trigger Discharge 


Output Threshold 


Reset Control 





(b) 


2 

ZN1034E 11 
10 

9 

8 


1 
2 
3 
4 
9 
6 
7 





Fig. A.70. The 555 and ZN1034E timers. 
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The 555 is ideal for cases where a single pulse from the computer can turn 
on a device (electric motor, perhaps) but is not required to stop under 
computer control. It is realised, of course, that the BBC machine includes 
programmable timer facilities so the job could have been entirely software- 
controlled without a 555. However, it is good to be aware of alternative 
possibilities. | 

The output pulse-width, which determines the ON time of the device, is 
dependent on the value of C and R according to the following formula: 


T = 1.1 CR (where C is in uF and R 1s in megohms) 


Forexample, if C= 0.1 uF and R= 100K, the ON time will be0.011 seconds. 
The figures illustrate that the 555 is not generally suitable for long time periods. It 
is not recommended to use R values greater than | M, and capacities of the 
order of some microfarads means using electrolytics with wide tolerances. 
For periods over several seconds up to minutes, it is better to use one of the 
more sophisticated timers such as the ZN1034E shown in Fig. A.10(b). The 
timing formula is: 


T = 2736 CR (where C is in uF and R in megohms) 


The multiplication factor 2736 is achieved by an internal 12-bit binary 
counter allowing time periods up to an hour or more. A useful feature is the 
provision of two complementary outputs, marked Q and Qin the diagram. 
It is a complex 14-pin chip with some of the pins allocated to external 
calibration resistors but only the simplified wiring is shown. To utilise the 
full potential, it is worth sending for detailed data sheets. 


Decoders 

A decoder will have several outputs but only one selected output can be 
activated at a time. The particular output depends on the specific 
combination applied to the selection input. Three select terminals can 
provide only eight different combinations of binary digits, the rule being: 


Number of combinations of n bits = 2n 


For example, to select any one of sixteen outputs requires four select 
inputs. There is a wide range of decoders in the TT Lseries. Inaddition to the 
select inputs, there will be one or more enabling inputs, allowing decoders to 
be linked together. Some of these may be active-high and some active-low. 
It is important to realise that all enable inputs must be activated before the 
chip becomes ‘live’. 


Demultiplexers 

A demultiplexer routes serial input data to one particular output line and 1s 
the logic equivalent to a single pole multiway switch. Like the decoder, the 
particular output selected depends on the combination supplied tothe select 


232 Advanced Machine Code Techniques for The BBC Micro 


terminals. In fact, a decoder with enable inputs can be used as a 
demultiplexer by feeding the data to one which 1s active-high. 


Multiplexers 

These are mirror images of demultiplexers. They route any one of many 
input data sources to a single output line. The particular input source 
depends on the combination applied to the select inputs. The usual enable 
terminals will be present in most TTL chips. 


Encoders 

An encoder delivers a particular binary pattern on the output terminals. 
depending on which of the many input lines is activated. For example, 
there could be ten inputs, each capable of producing a unique four-bit 
pattern on the outputs and acting as a decimal to binary encoder. Most 
microcomputer keyboards are decoded by scanning software but some of 
the more expensive types are hardware encoded. 


Counters 

A counter is essentially a device which delivers an output binary pattern 
which changes on receipt of each input pulse. The TTL range offers a wide 
variety of counters. They may be classified as follows: 


Binary counters. The input pulses cause the four-bit output to progress from 
0000 to 1111 inasimple binary sequence. The pulse starts the count again at 
0000. 


BCD counters. The input pulses cause the four-bit output to progress from 
0000 to 1001 (0 to 9 decimal). The tenth pulse starts the count again at 0000. 


All counters will be supplied with a reset-to-zero input and most supply a 
terminal which emits a pulse when the count goes over the top to 0000. This 
is useful for cascading the output of onecountertothe input of another. Two 
binary counters in cascade would then handle counts up to I111 1111 (255 
decimal) and two BCD counters up to 1001 1001 (99 decimal). There is 
another classification according to the direction of count. For example, 
those described above are up-counters but some varieties can be persuaded 
to down-count. For example, a four-bit binary down-counter has I 111 as 
the ‘reset’ state and decreases on each input pulse towards 0000. Down- 
counters are not supplied as such but some of the more sophisticated 
varieties have a control terminal which can be maintained HIGH for up- 
count and LOW for down-count. It is worth mentioning that an ordinary 
up-counter can be turned into a down-counter by inverting the outputs. 


Shift-registers 

It is self-evident that a shift-register shifts but, as with counters, there are 
generic variants depending on the direction of shift (left or right) and 
whether the initialised data is applied serially or in parallel. They will all 
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have a ‘shift’ terminal (marked T or Clock). Every pulse on T shifts the 
contents one place, bits being pushed out at one end. 

Parallel-in-parallel-outs, known as P/POs, accept parallel data on the 
four inputs, and data is available on the four output lines after the shift 
pulses have ended. The new input data Is only let in to the register whenan 
enabling level is applied to the appropriate terminal. 

Parallel-in-serial-outs, known as P/SOs, are similar to above but the 
output data can only be obtained a bit at a time on the serial output line. 

Whatever other facilities they posses, shift-registers will always have 
serial-in and serial-out terminals. Many varieties exist in the TTL range. 
Some handle 8-bits and some can shift left or right depending on the state 
of a control terminal. The most obvious use for shift-registers is for parallel 
to serial or serial to parallel conversion. 


Buffer registers 

A buffer is a temporary holding register for data, the contents of which are 
subject to a latching pulse. Typically, there will be four data inputs, four 
data Outputs and a terminal which is used to latch in the new data. Data 
variations at the input are ‘unseen’ until a latching pulse is applied when the 
current data overwrites the old. Some buffers have tristate outputs and are 
bi-directional. 


Appendix B 
Operating System Calls 





Subroutine Address Vector Function Register/s 
(name) (hex) (hex) involved 
OSWRCH- FFEE 20E Write character A 
OSRDCH ~ FFEO 210 Read character A 
OSNEWL ~ FFE7 — LF and CR to screen A 
OSASCI FFE3 ~~ Write character (NL if A=0D) A 
OSCLI PREY 208 Interprets command line + 
OSBYTE FFF4 20A All OSBYTE calls and * FX A,X,Y 
OSWORD FFF! 20C All OSWORD calls A,X,Y 
OSFILE FFDD 212 Load and save file A,X,Y 
OSBGET FFD7 216 Load and save data file A,X,Y 
OSBPUT FFD4 218 Put byte in file A,Y 
OSFIND FFCE 21C Open or close file A,X,Y 
OSGBPB FFDI 21A Multiple OSBPUT and OSBGET A,X,Y 
NVRDCH  FFCB -— Non-vectored read character 

NVWRCH_~ FFC8 — Non-vectored write character 

GSREAD ~ FFC5 — Read string character 

OSEVEN FFBF — Activate event 

GSINIT FRCZ = Initialise string input 

OSRDRM __ FFB9 = Read paged ROM byte 
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Appendix C1 
6502 Complete Instruction Set 


Add with carry | AtTM+C-A 


Address mode Op-code Bytes Cycles 






ADC 











Immediate & 69 2 2 
Zero-page &65 2 3 
Zero-page,X &75 2 4 
Absolute &6D 3 4 
Absolute, X &7D 3 4or5 
Absolute, Y & 79 3 4or5 
(Indirect, X) &6 1 2 6 
(Indirect), Y &7 | 2 5 






A and M—~A | NZ 


AND And with A 


Address mode Op-code Bytes Cycles 


Immediate & 29 2 2 
Zero-page & 25 2 3 
Zero-page,X & 35 Z 4 
Absolute &2D 3 4 
Absolute, X &3D 3 4or5 
Absolute, Y & 39 2 4or5 
(Indirect, X) & 2 | Z 6 
(Indirect), Y &3 | 2 5 
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Shift left C= (7..,.07-0- P= NZ 


Address mode Op-code _ Bytes 
Accumulator &0A l 


Zero-page &06 
Zero-page, X & 16 
Absolute &0E 
Absolute, X &1E 





Branch if C=0 Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative &90 2 3 or2 


Branch if C=! Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative & BO 2 3o0r2 


Branch if Z=1 Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative & FO Z 3 or2 


A and M.M7=N,M6-V 


Address mode Op-code Bytes Cycles 
Zero-page & 24 Z 3 
Absolute &2C 3 4 

















Branch if N=1 Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative & 30 2 3 or2 
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BNE Branch if Z=0 Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative & DO 2 3 or 2 





Branch if N=0 Flags unaltered 


Address mode Op-code_ Bytes Cycles 
Relative & 10 2 3 or 2 


PC+2 I flag=1 


Address mode Op-code_ Bytes Cycles 
Implied &00 l 7 


BVC Branch if V=0 Flags unaltered 


Address mode Op-code Bytes Cycles 
Relative & 50 2 3or?2 








Break 





















Branch if V=1 





BVS Flags unaltered 










Address mode Op-code_ Bytes Cycles 
Relative &70 2 3 or2 


Clear Carry C flag=0 


Address mode Op-code_ Bytes Cycles 
Implied &18 2 











CLD Clear Decimal D flag=0 


Address mode Op-code_ Bytes Cycles 
Implied & D8 I 2 
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CLI Clear I mask 


Address mode Op-code_ Bytes Cycles 
Implied &58 I 2 


Clear overflow 


Address mode Op-code Bytes Cycles 
Implied &B8 l 2 


Address mode Op-code_ Bytes Cycles 
















Immediate & C9 2 2 
Zero-page &C5 2 3 
Zero-page, X & D5 Ps 4 
Absolute & CD 3 4 
Absolute, X & DD 3 4or 5 
Absolute, Y & DI 3 4or 5 
(Indirect, X) &C1 2 6 
(Indirect), Y & D1 2 5 or 6 







Address mode Op-code Bytes Cycles 
Immediate & EO 2 Z 
Zero-page &E4 2 3 
Absolute & EC 3 4 


Address mode Op-code Bytes Cycles 
Immediate & CO 2 2 
Zero-page ~ &C4 2 2 
Absolute &CC 3 4 
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DEC Decrement M|M-I—-M NZ 


Address mode Op-code_ Bytes Cycles 









Zero-page & C6 2 5 
Zero-page, X & D6 2 « 6 
Absolute & CE 3 6 

7 


Absolute, X & DE i 
DEX Desrement & | X-1-X 


Address mode Op-code Bytes Cycles 


Implied &CA 2 


DEY Decrement Y | Y—-1— Y NZ 


Address mode Op-code_ Bytes Cycles 





Implied &88 I 2 
| 

EOR Exclusive-OR| AexcM-A NZ 
Address mode Op-code_ Bytes Cycles J 
Immediate &49 2 2 
Zero-page &45 2 3 
Zero-page, X &55 2 4 
Absolute &4D 3 4 
Absolute, X &5D 3 4or 5 
Absolute, Y &59 3 4or 5 
(Indirect, X) &4 | 2 6 
(Indirect), Y &5 1 Z 5 


Increment M 





Address mode Op-code_ Bytes Cycles 
Zero-page & E6 2 5 
| Zero-page,X &F6 


2 6 
Absolute &EE 3 6 
Absolute, X & FE 3 7 
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Address mode 
Implied 






Address mode 
[Implied 


Address mode 


Absolute +? 
Indireét: >’ 





Absolute 





Address mode 
Immediate 
Zero-page 


Zero-page, X 
Absolute 
Absolute, X 
Absolute, Y 
(Indirect, X) 
(Indirect), Y 


Op-code 
&E8 | 2 


&C8 2 





Jump to SR Flags unaltered 


Address mode 











Bytes Cycles 







Yrrisey 


Op-code 





Bytes Cycles 






Flags unaltered 


Op-code Bytes Cycles 
&4C 3 3 
aoc >< *3 5 







Op-code Bytes Cycles 
& 20 3 6 


Op-code__ Bytes 
& AY 2 
&AS 

& BS 

& AD 

&BD 

& BY 

&A I 

&B | 
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Address mode Op-code Bytes Cycles 


Immediate &A2 2 2 
Zero-page & AG Z 3 
Zero-page, Y & B6 2 4 
Absolute &AE 3 4 
Absolute, Y & BE 3 4or5 








Address mode Op-code Bytes Cycles 
[mmediate & AD é a 


Zero-page &A4 2 3 
Zero-page,X &B4 Z + 
Absolute &AC 5 4 
Absolute, X & BC 3 4or 5 





LSR | Logical SR | 0-(7...0)-°-C |N=0,20 


Address mode Op-code_ Bytes Cycles 


Accumulator &4A | 2 
Zero-page & 46 2 5 
Zero-page,X &56 2 6 
Absolute &4E 3 6 
Absolute, X &SE 3 7 


| = A , Al ie 
POE 7VOrne 
NOP No operation | Ffags unaltered 


Address mode Op-code Bytes Cycles 
Implied &EA 2 
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Address mode Op-code_ Bytes Cycles 
Immediate &09 2 2 










Zero-page &05 Z 3 
Zero-page, X & 15 2 4 
Absolute &0D 3 4 
Absolute, X &1D 3 4or5 
Absolute, Y &19 3 4 or 5 
(Indirect, X) &0 1 2 6 
(Indirect), Y & 11 Z 5 






Address mode Op-code_ Bytes Cycles 
Implied &48 I 3 


Push status Flags unaltered 


Address mode Op-code_ Bytes Cycles 
Implied &08 I 3 










Address mode Op-code_ Bytes Cycles 
& 68 I 4 


Address mode Op-code_ Bytes Cycles 
Implied &28 I 4 








Address mode 
Accumulator 
Zero-page 
Zero-page,X 
Absolute 
Absolute, X 







Address mode 
Accumulator 
Zero-page 
Zero-page, X 
Absolute 
Absolute, X 


Address mode 
Implied 






Address mode 
Implied 


Address mode 
Immediate 
Zero-page 
Zero-page, X 
Absolute 
Absolute, X 
Absolute, Y 
(Indirect,X) 
(Indirect), Y 


lag 


Return from SR 


&60 I 6 


Subtract 
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Op-code Bytes Cycles 
&2A | 2 
&26 Z 5 
& 36 2 6 
&2E 3 6 
&3E 3 7 






Op-code Bytes Cycles 
&6A l 2 
& 66 2 5 
& 76 2 6 
&6E 3 6 
&7E 3 7 







Flags as pulled 


Op-code Bytes Cycles 
& 40 l 6 











Flags unaltered 


Op-code Bytes Cycles 








a-M-C- A NZCV 





Op-code Bytes Cycles 
& ED 2 2 
& ES Z 3 
& FS 2 4 
& ED 3 4 
& FD 3 4 or 5 
& FY 3 4or5 
&EI 2 6 
& FI 2 5 or 6 
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Address mode Op-code Bytes Cycles 
Implied & 38 I 2 


Address mode Op-code_ Bytes Cycles 
Implied &F8 l 2 


| seme fet 


Address mode Op-code_ Bytes Cycles 
Implied &78 l 2 












STA Flags unaltered 


Address mode Op-code Bytes Cycles 


Zero-page &85 2 3 
Zero-page, X &95 2 4 
Absolute &8D 3 4 
Absolute, X &9D 3 2 
Absolute, Y &99 3 5 
(Indirect, X) &8 | Z 6 
(Indirect), Y &9] 2 6 


Address mode Op-code Bytes Cycles 
Zero-page &86 2 3 
Zero-page, Y &96 Z 4 
Absolute © &8E 3 4 







6502 Complete Instruction Set 







Address mode Op-code_ Bytes Cycles 







Zero-page &84 2 3 
Zero-page, X &94 2 4 
Absolute &8C 3 4 










Transfer 


Address mode Op-code Bytes Cycles 
Implied &AA I Z 


Pat 


Address mode Op-code_ Bytes Cycles 
Implied &AS8 I 


Address mode Op-code_ Bytes Cycles 
Implied &98 I 2 


sx | tamer | sr-x | nz 


Address mode Op-code Bytes Cycles 
Implied & BA I 2 


Address mode Op-code Bytes Cycles 
[Implied &8A I 2 
















245 
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TXS Transfer X—-SP Flags unaltered 


Address mode Op-code Bytes Cycles 
Implied &IA l 2 


Appendix C2 
6502 Instruction Set: Classification by processor flag 


Updates N,Z and C flags: 
ADC,ASL,CMP,CPX,CPY,ROL,ROR,SBC. 


Updates N and Z flags: 
AND,DEC,DEX,DEY,EOR,INC,INX,INY,LDA, 
LDX,LDY,ORA,PLA,TAX,TAY,TYA,TSX,TXA. 


Updates N,Z,C and V flags: 
ADC,SBC. 


Updates N,C and clears N: 
LSR. 


Op-codes not mentioned above either: (a) have no effect on processor flags 
or 

(b) set or reset certain flags by direct programming 

CLO CLDCLLCEV SEC SESE). 


Appendix C3 
6502 Instruction Set: Classification by addressing modes 


Immediate: 
ADC,AND,CMP,CPX,CPY,EOR,LDA,LDX,LDY,ORA,SBC 


Zero-page: | 
ADC,AND,ASL,BIT,CMP,CPX,CPY,DEC,EOR,INC,LDA,LDX, 


LDY,LSR,ORA,ROL,ROR,SBC,STA,STX,STY 


Zero-page, X: $ 
ADC,AND,ASL,CMP,DEC,EOR,INC,LDA,LDY,LSR,ORA,ROL, 
ROR;SBC,STA,STY 
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Absolute: | 4 
ADC,AND,ASL,BIT,CMP,CPX,CPY,DEC,EOR,INC,JMP.JSR, 
LDA,LDX,LDY,LSR,ORA,ROL,SBC,STA,STX,STY 


Absolute, X: S 
ADC,AND,ASL,CMP,DEC,EOR,INC,LDA,LSR,ORA,ROL,ROR, 
SBC,STA | 


Absolute, Y: L 
ADC,AND,CMP,EOR,LDA,LDX,ORA,SBC,STA 


(Indirect, X): 4 q e3\ 
ADC,AND,CMP,EOR,LDA,ORA,SBC,STA | Suv . 


(Indirect), Y: 4 
ADC,AND,CMP,EOR,LDA,ORA,SBC,STA 


Accumulator: 
ASL,LSR,ROL,ROR 


Implied: Pi 
BRK,CLC,CLD,CLI,CLV,DEX,DEY,INX,INY,NOP,PHA,PHP,PLA, * 
PLP,.RTI,.RTS,SEC,SED,SEI,TAX,TAY,TSX,TXA,TXS,TYA 


Relative: y 
BCC,BCS,BEQ,BMI,BNE,BPL,BVC,BVS 


The following instructions have no effect on status flags: 
BCC,BCS,BEQ,BMI,BNE,BPL,BVC,BVS,JMP,JSR,NOP,PHA,PHP, 
RIS,STA,STA,STY,FAS 


Appendix C4 
6502 Instructions in order of common usage 


In common use: 

ADC BCC BCS BNE CLC CMP CPX CPY DEX DEY INX 
INY LDA LDX LDY RTS 

SBC SEC STA STX STY TAX TAY TYA TAA 


Often used: 
BEQ ASL BMI BPL DEC INC JMP JSR LSR_ PLA 
PHA ROL ROR 


Sometimes used: 
AND BIT BRK BVC BVS CLV EOR NOP ORA 


Seldom used: 
CLD PHP PLP RTI SED SEI TSX TXS 
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The above classification must not be taken too seriously. It is very much a 
question of personal preference and programming style. It is doubtful if 
two writers would ever agree. However, it may still be useful, particularly if 
you are in the initial learning phase. 


Appendix D 
Glossary of Terms 


absolute address: the numerical number identifying an address. 

accumulator: the main register within the microprocessor and the only one 
equipped for arithmetic. 

ACR: abbreviation for Auxiliary Control Register. One of the VIA 
registers. 

active high: any input which requires a logic | to turn it on. 

active low: any input which requires a logic 0 to turn it on. 

address bus: the 16 lines from the microprocessor which activate the 
selected memory location or device. 

address: a number which is associated with a particular memory location. 
This number can be in decimal or hexadecimal. 

and gate: a gate which delivers a logic | out only if all inputs are logic 1. 

anding: using a mask to ensure selected bits become or remain 0. 

assembler mnemonics: a three-letter group uniquely defining an op code. 

assembler: a program which converts a program written in assembly code 
to the equivalent machine code. 

base address: the operand address of an indexed instruction. 

base: the number of different characters used in a counting system. 
Decimal is base 10, binary is base 2 and hex is base 16. 

bit: one of the two possible states of a binary counting system, | or 0. 

block diagram: a simplified diagram of an electrical system using 
interconnected labelled boxes. 

Boolean algebra: an algebraic notation, introduced by George Boole, for 
manipulating two-state logic. 

bubble sort: sorting an array by pairs at a time until all data is in order. 

bus: a collection of wires having some common purpose such as data bus, 
address bus and control bus. 

byte: a group of 8 bits. 

Centronics: trademark for a standardised parallel interface for printers. 

chip: accepted slang for an integrated circuit. 

compiler: system software which translates a program written in high level 
language into a machine code equivalent. The entire program is 
translated before it is run. 
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conditional assembly: when parts or all of the assembled code can vary 
depending on test conditions. 

Darlington: a two-transistor configuration used to multiply the current 
gain. 

data bus: the 8 lines from the microprocessor which carry the data to and 
from memory or external devices. 

DDRB: abbreviation for Data Direction Register B. One of the VIA 
registers. 

decimal: the normal counting system using the ten characters 0,1, ...9. 

decoder: a logic device with many possible outputs, only one of which can 
be activated at a time. This depends on the logic pattern applied to 
the ‘select’ inputs. 

direct addressing: the operand is a two-byte address as distinct from zero 
page addressing which is a single byte address. Also called absolute 
addressing. 

disassembler: a program which will display a machine code program in 
assembly language. The opposite process to assembly. 

effective address: the sum of the base and relative address. 

exclusive or gate: a gate which delivers a logic | only if the inputs are at 
different logic states. 

exclusive oring: using a mask to ensure that selected bits assume the 
opposite state. 

firmware:programs already in ROM. 

flag: a single bit used to indicate whether something has happened or not 
(see program status register). 

handshaking: a term used to describe the method of synchronising an 
external device to the computer. 

hardware: all the bits and pieces of a computer such as the chips, circuit 
board, keys, etc. That which you can see, feel and break. 

hex: see hexadecimal. 

hexadecimal: a counting system using sixteen characters 0,1,...9,A,B,C, 
DE,F. 

high byte: the most significant half of a two-byte number. 

high-level language: a language written in the form of statements, each 
statement being equivalent to many machine code instructions. BASIC 
is a high level language. 

IER: abbreviation for Interrupt Enable Register. One of the VIA registers. 

IFR: abbreviation for Interrupt Flag Register. One of the VIA registers. 

immediate addressing: the operand is the data itself rather than an address. 

implicit address: see implied address. 

implied address: an address which is inherent in the op-code, therefore 
requiring no following operand. 

index register: either the X or Y register when used to modify an address. 

indexed address: an address which has been formed by the addition of an 
index register’s contents. 
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indexed indirect addressing: the indirect address is the sum of the operand 
and contents of Y. | 

indirect addressing: the operand refers to an address in page zero which is 
the address of the wanted data. 

indirect indexed addressing: the indirect address is modified by the 
addition of Y. 

instruction register: a register within the microprocessor holding the op- 
code during instruction decoding. 

integer: a whole number without a fraction. 

integrated circuit: a chip containing a number of interconnected circuits. 

interpreter: system software which translates and executes each high level 
language statement separately. BASIC is normally interpreted although 
compiler versions exist. 

IRB: abbreviation for Input Register B. One of the VIA registers. 

latch: a buffer register which retains old data until new data is enabled. 

logic gates: electrical circuits which behave as switches. The input 
conditions determine whether the switch is ‘open’ or ‘closed’. 

low byte: the least significant half of a two-byte number. 

low-level language: a series of codes rather than a language, each line 
resulting in one order to the microprocessor. 

sb: the least significant bit in the byte (the right-most bit). 

LSI: large scale integration. Normally taken to mean in the order of tens of 
thousands of circuits on a single chip. The 6502 microprocessor is LSI. 

machine code: strictly, this term should be used for instructions written in 
binary; now used loosely to include hex coding and assembly language. 

macro: a routine assembled in line each time it is called. 

mask: a bit pattern used in conjunction with either AND,EOR or ORA to 
act on selected bits within a byte. 

merge sort: similar to bubble sort but faster due to progressive halving of 
the array before sorting into pairs. 

microprocessor: the integrated circuit which is the central processor or 
‘brain’ of the computer. The BBC machine uses the 6502 species. 

microprogram: a program inside the microprocessor which informs it how 
to carry out each machine code instruction. 

mnemonics: code groups chosen so we can memorise them easily. 

MOB: abbreviation for Moving OBject. Any screen object which is 
destined to be moved. 

msb: the most significant bit in the byte (the leftmost bit). 

msi: medium scale integration. Normally taken to mean up to a few 
hundred circuits on a single chip. 

nibble: a group of 4 bits. 

nybble: see nibble. 

object code: the translated version of the source code. 

one’s complement: a number formed by changing the state of all bits ina 
register. 
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op-code: abbreviation for operational code. It is that part of a machine 
code instruction which tells the computer what kind of action is required. 

operand: that part of a machine code instruction which gives the data or 
where to find the data. 

operating system: the software already in ROM which is designed to help 
you use the computer. 

or gate: a gate which delivers a logic out if any one or more inputs are 
logic |. 

ORB: abbreviation for Output Register B. One of the VIA registers. 

oring: using a mask to ensure selected bits become or remain 1. 

OSBYTE: keyword for Operating System Byte. Allows machine code calls 
to the operating system. 

OSRDCH: keyword for Operating System Read Character. A subroutine 
for reading a character from selected input systems. 

OSWORD: keyword for Operating System Word. Similarto OSBYTE but 
allows more parameters to be passed. 

OSWRCH: keyword for Operating System Write Character. Passes 
character to selected output system. 

page one address: any address within the range 256 to 511 decimal or 0100 
to OFF hex. 

PC: see program counter. 

PCR: abbreviation for Peripheral Control Register. One of the VIA 
registers. 

PIA: abbreviation for the 6820 Peripheral Interface Adaptor. 

pixel: a small picture element. 

program counter: the only 16-bit register in the 6502. Contains the address 
of the next instruction byte. 

program status register: a register containing flag bits which indicate if 
overflow, carries, etc. have been caused by the previous instruction. 

PSR: see program status register. 

read: to examine the existing data in a register or memory location, usually 
by means of LDA,LDX or LDY. 

relative address: the contents of the index register. 

resident assembler: an assembler which is already in ROM when you 
purchase the machine. 

resident subroutines: those in ROM which you can use, providing you 
know their starting address. 

ROM: abbreviation for Read Only Memory. Information stored is 
permanent even when the power supply is off. 

rotate: similar to shift but any bit pushed out from the carry is reinserted at 
the other end. 

rpn: abbreviation for ‘reverse Polish notation’, which is concerned with the 
order in which numeric variables are processed by a machine. 

RS423: a standardised interface which passes data serially along a single 
line. 
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scrolling: movement of the screen vertically or horizontally in order to 
bring fresh data into view. | 

shift: to move the bit pattern, one place to the left or right. 

signed binary: the binary system which uses the msb as a sign bit. 

silicon chip: most chips are fabricated froma silicon base although some of 
the super-fast modern varieties may be using a mixture of gallium and 
arsenic. | 

software: general term for all programs. 

source code: the program in its high level form. 

sprite: a screen object destined to be moved, together with accompanying 
coordinate data. Similar to MOB. 

SR: abbreviation for Shift Register. 

ssi: small scale integration. Normally taken to mean a few circuits, often 
simple logic gates, on a single chip. 

subroutine: a program segment which will normally have general-purpose 
use and which can be used in other programs. 

supply rail: a wire, feeding several components with a specific voltage. 

symbolic address: anarbitrarily chosen name used in place of the numerical 
address. It is only recognised if it has been previously assigned to this 
number. 

tristate. logic devices which can be either in the HIGH, LOW or open 
circuit state. When in the open circuit state, the output of the device is 
transparent to a common bus line. 

TTL: abbreviation for Transistor Transistor Logic, a family of compatible 
logic chips operating on 5 volts. First launched by Texas Instruments 
but soon second-sourced by other manufacturers. 

two-pass assembly: passing the source code twice through the assembler. 
Essential if branches are to forward addresses. 

two’s complement: a number formed by adding | to the one’s complement. 
Used for negative number representation. 

unsigned integer: a binary number without using the msb as a sign bit. 

user port: one of the output sockets which can be used to control your own 
special devices. 

user subroutines: subroutines which you can make for yourself. 

vector: a word in memory containing the address of an operating system 
routine. 

VIA: abbreviation for the 6522 Versatile Interface Adaptor chip. 

volatile memcry: one which loses all data when power is interrupted. 

write: to place new data into a register or memory location, usually by 
means of STA,STX or STY. The old data is overwritten by the new. 

X register: a general-purpose register which can be used in indexed 
addressing. 

Y register: similar to X register. 

zero page address: any address within the range 0 to 255 decimal or 00 to 
FF hex. 


Answers to Self Test 
Questions 





Chapter One 


1.1 
1.2 


i.3 
1.4 
1.6 
‘7 
1.8 
1.9 
1.10 
1.11 
1.12 
1.13 
1.14 
1.15 


1.16 
1.17 
1.18 
1.19 


1.20 
1.21 
L2z 


FF. 

Because the delay caused by the electronics, even when using serial 
transmission, is negligible in relation to the printer mechanics. 
The transmission cable can be much longer. 

General term is ‘transducer’. 

1.8 volts. 

FF. 

FC=FRED.FD=JIM.FE=SHEILA. 

The ‘B’ side. 

Key *HELP. ov #¥%6 


SEI. 

8 for ASCII and 2 for START/STOP. 

0.6 volts. 7 

Any address within the range FCO3 to FCE3. 

0.63 volts. 

Same as before, 4F. It doesn’t change until the next clock pulse 


arrives. 

The bar negates the logic. For example, NOT A can be written as A. 
Chip enable. 

63. 

FCFF is the paging register in the FRED band, used for supplying the 
JIM page address. 

Resident peripheral interfaces. 

The | MHz bus. : 

Primarily reserved for extra ROM/RAM. “" | 


Chapter Two 


2.1 
bud 


Dynamic (DRAMs). 
Expensive and lower packing density. 


2.3 


2.4 
2 


2.6 

2.7 

2.8 

2 

2.10 
2.11 
ZAZ 
2D 
2.14 
Z.15 
2.16 
2.17 
2.18 
219 
2.20 
Z.o1 
Pee 
po 
2.24 
2.25 
2.26 
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Each peripheral is activated by a specific address (or addresses), 
rather than by a special op-code. 7 
By special op-code in the instruction set. 

It supports the true arithmetical operations, addition and subtrac- 
tion. Many of the instructions only perform on the accumulator. 
&2D03. 

Each bit is an independent flag. 

D,I and C. 

D,I. 

N,V,B and Z. 

When working in unsigned binary. 

It is activated by bit 6 of the data during the BIT test. 

The last instruction left non-zero data. 

‘The € bit. 

Page | or RAM. 

Via the X register, using TXS. 

Before. 

ia 

High-byte. 

The program counter. 

The next higher address. 

R/W (sometimes written R/NW). 

Fixed by the microprocessor designers. 

The Instruction Register, IR. 

If by software BRK, the B bit is set in the status register. 

The 8080. 


Chapter Three 


3.1 


ey 
3.3 
3.4 
35 
3.6 
Sut 
3.8 
EB 
3.10 
3.11 
3.12 
3.13 


STA memory. 

ASL A. 

ADC memory. 
AND #&DF. 

EOR #&48. 

ORA #&04. 

& EE. 

Double-byte operand not allowed in immediate addressing. 
Program counter. 
112 to 143 inclusive. 
&11. 

JMP. 

X register. 

Indirect indexed. 

& 74. 
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3.14 &79. 
3.15 Post-indexed indirect. 


Chapter Four 


4.1 Full-stop between BNE and Label is wrong. 
4.2 OPT. 

43 3. 

4.4 No, because &83 is a negative relative address. 
4.5 (a)02. (b) 04 (c) 05. 

4.6 (a)56 (b) 12. 

4.7 PRINT ~!34587. 


Chapter Five 


§.1 0000 0011. 
§.2 0000 1001. 
5.3 &FF. 

5.4 2000 million. 
5.5 No. 


Chapter Six 


6.1 Wordlength, clock frequency, instruction repertoire, operating system 
and language interpreter. 


No formal answers are applicable to tests 6.2 to 6.5, 7.1 to 7.5 and 8.1 
to 8.4. 


Chapter Nine 


9.1 &FE6D. 

9.2 &FE69. 

9.3 &DF. 

9.4 CBI. | 

9.5 Bit 4 and bit 7 inthe IER must be set and the I-bit in the process status 
register must be reset. 

9.6 Set bit | inthe ACR. 





6502 microprocessor 33 
6502 vectors 44 

a/d conversion |] 
absolute addressing 63 


absolute indexed addressing 66 


accumulator 37 

active levels 221 

address bus 19 

address coordinates 179 
address decoding 23 
address modification 38 
address pointer 69 
address register 45 
addressing modes 62 
ALU 47 

application note |, 22 
arithmetic instructions 56 
array variable storage 114 
ASIA I] 

assembler 77 


bank switching 6 
benchmarks 33 

binary 215 

binary coded decimal 217 
branch labels 79 

bubble 21 

bubble sorts 114 

buffer registers 233 


calling 84 

chip enable 19 

clearing registers 57 
clock 20 

code efficiency 218 
comparisons 61 
composite video 5 
conditional assembly 89 
control lines 19 

current page 28 


data bus 19 

data transfers 37 

decode matrix 47 
decoders 231 
demultiplexers 231 
direct screen 173 
direction register 197 
double-byte working 101 


encoders 232 
execute phase 44 
external interrupts 212 


fetch phase 44 
fixed-length records 144 
fixed locations 82 

flip flops 223 

floating point array storage 139 
floating point merge sorts 138 
forward branching 85 
four-byte addition 103 
four-byte subtraction 105 
Fred address 16 


generations 2 
glossary 249 


hardware scrolling 185 
hex op-code 54 
hexadecimal 216 


immediate addressing 62 
implied addressing 62 

indexed addressing 66 
indexed indirect addressing 72 
indexing 38 

indirect addressing 68 

indirect indexed addressing 70 
indirection operators 97 
instruction register 44 


instruction set 235 
instructions 52 

integer array addition 108 
internal interrupts 210 
interrupt 8 

interrupt enable register 204 
interrupt flag register 202 
interrupt mask 9 


Jim 28 


LEDs 227 
logic gates 219 
logic levels 218 
Isi 2 


macros 87 

memory mapping 14 
merge sorts 126 
microprograms 45 
mnemonic op-codes 78 
moving objects 169 
multi-byte counting 106 
multi-colour MOBs 177 
multifield sorts 143 ° 
multiplexers 232 


nibble 173 


one’s complement 59 
operand 52 

operand variables 78 
operating system calls 234 
operation symbols 54 
opto-isolator 228 


page boundary 29 
paging register 30 
parameter passing 90 
pixel 173 

power pack 5 

process status register 39 
program counter 42 
PROM programmers 7 
pseudo-variables 82 


read/ write 20 


Index 


refreshing 34 

relative addressing 65 
relays 227 

remarks 80 

reset line 20 

Reverse Polish 37 
ROM chips 6 

RS423 10 


saving code 98 
schmitt trigger 229 
screen output 5 
second processor 3 
Sheila addresses 14 
shift registers 232 
sideways scrolling 185 
Silicon Valley 2 
software interrupt 48 
sorting times 154 
stack pointer 41 

static RAM 35 
storing code 80 

string bubble sort 120 
switch bounce 226 


timers 205 

tristate 225 

TTL? 

two’s complement 59 
two-dimensional sorts 148 


UHF5 

UHF modulator 5 
up/down counting 57 
user port 7 


VDU codes 162 
vectors 158 

vertical scrolling 190 
VIA 7 

VIA registers 195 


wired-OR 225 
word length 3 


zero-page addressing 64 
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HOW TO GET THE BEST FROM THE BBC MICRO! 


The potential capabilities of the BBC Micro are enormous, 
but you need to know machine code to achieve maximum 
versatility and performance. Machine code enables you to 
generate faster moving graphics, save memory and use 
peripherals and ancillary equipment most effectively. 


This book takes you much further than A P Stephenson's 
introductory ‘Discovering BBC Micro Machine Code’. It sets 
out all the hardware and microprocessor theory you need 
to make the best use ofthe BBC Micro's excellent built-in 
assembler. Numerous coding examples, worked 
exercises and hints on practical projects - including special 
effects and interfacing - are provided. 


This detailed compendium of techniques will prove of 
lasting value to all BBC Micro users who want to master 
their machine completely. 


The Authors : 

A P Stephenson has a long and distinguished record as a 
writer on electronics and computing for the enthusiast. He is 
a regular contributor to the popular computing journals and 
is the author of four other books. He is particularly well 
known as a writer on machine code. 


D J Stephenson is another life-long computer enthusiast. He 
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